Exploding the idea of a database

In a business application, the database is just about the most important and valuable component. It’s where your customers’ data is kept, and your customers’ data is your biggest asset and liability.

In a typical tiered architecture, the database sits at the bottom of everything. It is on the critical path of almost any interaction a user has with the system. Making it fast makes the system fast. Making it reliable makes the system reliable.

Conversely, when there’s a problem with your app, the database is probably in the middle of it. Doing anything of value requires wrangling these highly complex servers, usually via SQL. Of course databases would be a target for developers to try to improve things, as they are a huge part of our daily work.

It feels like there’s a lot going on right now in the database/data storage world.

There are tons of new types of database which tradeoff points in the design space:

For a long time I’ve been unhappy that SQL is the default interface to the database we use at work (MySQL). It’s a language designed for humans to write, not for machine-to-machine communication.

I recently learned about SQLite’s virtual machine: basically, a SQL query is “compiled” into a program in a bytecode language which includes both low-level primitives like loading and storing to registers, as well as high-level operations like creating a btree.

(Interestingly, limbo tests their bytecode generation against SQLite’s as they are trying to be SQLite compatible.)

This made me wonder if SQL could be replaced by a bytecode interface for database clients. Developers wouldn’t want to write bytecode directly, but if we’re already using ORMs to construct SQL expressions, which get parsed by the database back into bytecode anyway, why not skip out the middle-man and have the ORM product bytecode?

Why bother? I dunno, it just feels better than treating a business analytics language from the 70s as a compilation target. It’d be nice if I could bulk insert 1000 rows without having to prepare a statement containing 1000*num_columns placeholder question marks first.

For the forseeable future, we’re sticking with SQL. But I can dream.