Exploding the idea of a database

In a business application, the database is just about the most important and valuable component. It’s where your customers’ data is kept, and your customers’ data is your biggest asset and liability.

In a typical tiered architecture, the database sits at the bottom of everything. It is on the critical path of almost any interaction a user has with the system. Making it fast makes the system fast. Making it reliable makes the system reliable.

Conversely, when there’s a problem with your app, the database is probably in the middle of it. Doing anything of value requires wrangling these highly complex servers, usually via SQL. Of course databases would be a target for developers to try to improve things, as they are a huge part of our daily work.

It feels like there’s a lot going on right now in the database/data storage world.

While the boring technology crowd suggests you should just use Postgres,
there’s growing advocacy for using SQLite
and even approaches to making SQLite a distributed system.

There are tons of new types of database which tradeoff points in the design space:

Embedded databases are having a moment, especially the key-value variety;
SpacetimeDB embeds your code directly into the database, flipping the direction of embedding, but still merging storage and application in a way that acknowledges the similarity between entity-component-system architecture and the relational model;
CloudFlare’s Durable Objects co-locate code and data storage, creating tiny on-demand edge databases that can coordinate like actors;
New products are building on Apple’s FoundationDB distributed kv store which can be used to build all kinds of data storage on top of with its “layering” concept;
There’s an explosion of databases for use in web clients, especially for local-first applications, ranging from the minimal tuple-database experiment, through the world of CRDTs to the earthstar protocol.

For a long time I’ve been unhappy that SQL is the default interface to the database we use at work (MySQL). It’s a language designed for humans to write, not for machine-to-machine communication.

Here’s a thorough analysis of SQL’s issues.
There are attempts to invent a better query language for developers,
suggestions for how to make SQL itself better,
ORM layers which try to hide SQL from the developer entirely or at least make it easier to build parameterised SQL queries.

I recently learned about SQLite’s virtual machine: basically, a SQL query is “compiled” into a program in a bytecode language which includes both low-level primitives like loading and storing to registers, as well as high-level operations like creating a btree.

(Interestingly, limbo tests their bytecode generation against SQLite’s as they are trying to be SQLite compatible.)

This made me wonder if SQL could be replaced by a bytecode interface for database clients. Developers wouldn’t want to write bytecode directly, but if we’re already using ORMs to construct SQL expressions, which get parsed by the database back into bytecode anyway, why not skip out the middle-man and have the ORM product bytecode?

Redis executes Lua programs directly against the database, which is a very high-level language but motivates the idea that sometimes developers want to write code that runs inside the database;
LINQ demonstrates the value of an ORM built into a programming language which pulls query parsing etc. out of the database and into the client;
Here’s a Protobuf-based spec for exchanging query plans without SQL which seems to be used in some of the new database ecosystem products like Datafusion;
Jamie Brandon explores query compilation, which is not exactly related to the database’s API, but is relevant to implementation details like SQLite’s VM. (For more wild ideas about databases, programming languages and WASM: Have You Tried Rubbing A Database On It.)

Why bother? I dunno, it just feels better than treating a business analytics language from the 70s as a compilation target. It’d be nice if I could bulk insert 1000 rows without having to prepare a statement containing 1000*num_columns placeholder question marks first.

For the forseeable future, we’re sticking with SQL. But I can dream.

Filed under:coding