Exploding the idea of a database

#

In a business application, the database is just about the most important and valuable component. It’s where your customers’ data is kept, and your customers’ data is your biggest asset and liability.

In a typical tiered architecture, the database sits at the bottom of everything. It is on the critical path of almost any interaction a user has with the system. Making it fast makes the system fast. Making it reliable makes the system reliable.

Conversely, when there’s a problem with your app, the database is probably in the middle of it. Doing anything of value requires wrangling these highly complex servers, usually via SQL. Of course databases would be a target for developers to try to improve things, as they are a huge part of our daily work.

It feels like there’s a lot going on right now in the database/data storage world.

There are tons of new types of database which tradeoff points in the design space:

For a long time I’ve been unhappy that SQL is the default interface to the database we use at work (MySQL). It’s a language designed for humans to write, not for machine-to-machine communication.

I recently learned about SQLite’s virtual machine: basically, a SQL query is “compiled” into a program in a bytecode language which includes both low-level primitives like loading and storing to registers, as well as high-level operations like creating a btree.

(Interestingly, limbo tests their bytecode generation against SQLite’s as they are trying to be SQLite compatible.)

This made me wonder if SQL could be replaced by a bytecode interface for database clients. Developers wouldn’t want to write bytecode directly, but if we’re already using ORMs to construct SQL expressions, which get parsed by the database back into bytecode anyway, why not skip out the middle-man and have the ORM product bytecode?

Why bother? I dunno, it just feels better than treating a business analytics language from the 70s as a compilation target. It’d be nice if I could bulk insert 1000 rows without having to prepare a statement containing 1000*num_columns placeholder question marks first.

For the forseeable future, we’re sticking with SQL. But I can dream.


Durable execution versus session backends

#

There are two up-and-coming architectural paradigms for app development in the web world:

In this post I’m going to lay out my understanding of the similarities and differences between them.

This was prompted by me discussing CloudFlare’s Durable Objects product in the Changelog Zulip, and someone thinking I was talking about durable execution, when I was actually trying to talk about session backends.

There is some overlap between products which can be used to build either architecture, so I want to try to clarify the situation.

I have been waiting for somebody to clone Durable Objects so I could build session backends without so much vendor lockin, so I have a particular interest in the landscape around these concepts.

Durable execution

Durable execution is described by Temporal as:

Durable execution systems run our code in a way that persists each step the code takes. If the process or container running the code dies, the code automatically continues running in another process with all state intact, including call stack and local variables.

https://temporal.io/blog/building-reliable-distributed-systems-in-node

Durable execution frameworks, also sometimes called workflows-as-code, evolved from “workflow engine” systems which expressed sequences and branching logic as state machines using GUIs or configuration languages like XML or JSON.

Examples of durable execution products include Temporal, Azure Durable Functions, and Restate among many others. In this post from last year, Chris Riccomini gives an overview of the durable execution startup landscape.

Temporal’s example apps include order fulfilment, money transfers, lambda function orchestration for stock trading, and background checks.

A hallmark of durable execution systems is that a single “function” or “workflow” can be very long-running compared to usual web request handlers or queue jobs.

The Temporal post above gives this example of waiting for some time to send a “rate your delivery” message:

await sleep('1 hour');
await sendPushNotification('✍️ Rate your meal');

Session backends

Jamsocket describes a session backend as:

Spin up a dedicated process on the server for each user, and maintain state in that process. Give the frontend a persistent connection to its dedicated process, and shut the process down when the user closes the app.

https://jamsocket.com/blog/session-backends

This is in contrast to the usual “stateless” approach to building web apps where “every request sent from the frontend to the backend contains all the information needed to fulfill it, or at least, contains enough context for the server to locate that information in a database.”

It is also different to a heavily stateful client, like an editor or game. This is the territory of single-page-app frontend frameworks and state management libraries. (But session backends usually go hand-in-hand with a heavily stateful client.)

When building realtime apps, the database becomes the bottleneck, as described in this excellent post:

Keeping our request handlers stateless, does NOT really solve the scalability problem; instead – it merely pushes it to the database.

Sure, if we’re working in a classical Huge-Company environment, we (as app developers) can say “it is not our problem anymore” (washing our hands of the matter Pilate-style). However, if our aim is not only to cover our ***es to keep our current job while the project goes down, but rather want (in a true DevOps spirit) to make sure that the whole system succeeds – we need to think a bit further.

http://ithare.com/scaling-stateful-objects/

(This post contains an example of trying to shoehorn this kind of realtime app into AWS: CloudFlare’s durable multiplayer moat.)

Figma, in their engineering blog, describes in some detail how they built their own “session backend” for their multiplayer editor:

We use a client/server architecture where Figma clients are web pages that talk with a cluster of servers over WebSockets. Our servers currently spin up a separate process for each multiplayer document which everyone editing that document connects to. If you’re interested in learning more, this article talks about how we scale our production multiplayer servers.

https://www.figma.com/blog/how-figmas-multiplayer-technology-works/

Session backend products include Jamsocket and their Plane project, CloudFlare Durable Objects, and maybe Fly Machines if you built some more coordination on top of them?

CloudFlare’s Durable Object documentation describes the following use-cases for session backends:

Use Durable Objects to build collaborative editing tools, interactive chat, multiplayer games and applications that need coordination among multiple clients, without requiring you to build serialization and coordination primitives on your own.

A hallmark of session backend systems is the ability for multiple clients to open realtime connections (i.e. WebSockets) to a unique “room” or “object” and coordinate or collaborate.

Here’s an example of broadcasting a message to all connected clients from CloudFlare’s chat example app:

// Iterate over all the sessions sending them messages.
let quitters = [];
this.sessions.forEach((session, webSocket) => {
  if (session.name) {
    try {
      webSocket.send(message);
    } catch (err) {
      quitters.push(session);
    }
  } else {
    // This session hasn't sent the initial user info message yet, so we're not sending them
    // messages yet (no secret lurking!). Queue the message to be sent later.
    session.blockedMessages.push(message);
  }
});

Having in-memory state for things like the list of connected clients and their WebSocket endpoint makes realtime coordination code easier to write.

Comparison

Where it gets confusing is that some products overlap these concepts.

CloudFlare Durable Objects are both a uniquely-addressable process, and also a persistent database unique to each instance. You could use this per-object database for durable execution - and in fact, this is what CloudFlare themselves did, building Workflows on top of Durable Objects.

Going in the other direction, Rivet built what they refer to as an “actor” framework for session backends on top of their durable execution engine - doing it the other way around!

It seems to me that the similarity is in the “private state” which allows a “session” to be identified persistently even when it isn’t currently running.

This allows a “session” to correspond 1-to-1 with some other persistent entity or concept like a “document”, “project”, “chat room” or “game”. Durable execution needs that same “private storage” to remember the execution log of each unique invocation of a workflow.

There are some players in this space I can’t place due to unfamiliarity. For example, Microsoft Orleans seems to have lots of similarities to Durable Objects, but I can’t see any documentation about connecting to an instance of an actor/grain. Akka’s cluster routing seems to provide some similar capabilities.

Bonus: Erlang

When either of these concepts are discussed, Erlang/Elixir users like to drop in and say something to the effect of

OTP does all of this already, thank goodness I don’t have to work with these complicated products!

My experience with Erlang is restricted to learning me some Erlang, and I have had a tiny bit of Elixir experience on a basic web app. I wish I could better evaluate claims like this with some real experience.

From what I have been able to glean, it would still take some work to have as turnkey an experience as CloudFlare’s globally distributed objects. Of course, it’s the global routing that’s the hard part, not as much the management of the “session” process once a request has arrived at it.

Bonus 2: actors

Speaking of Erlang, where do “actors” fit into all this?

The “Actor Model” seems to be a fairly broad term, but I think it most closely fits the “session backend” concept more than “durable execution”. But with the connotations that it’s more used for “business logic” than it is for, e.g. a video game instance with high-performance needs.

The concept of unique identity doesn’t seem to be an integral part of the actor model. But it sort of needs to be in the current session backend architectural paradigm, because session-actors are exposed to outside client communications using a unique identifier.


What Laravel's encrypted casts can protect you from

#

The other week I asked a very pointy question on the PHP Australia Slack channel:

Real talk, is encrypted casting security theater?

My question was half genuine confusion, and half Cunningham’s Law.

Encrypted casting is a feature in Laravel’s Eloquent ORM which allows us to very transparently encrypt a value in one database column of a model. This encryption happens in the PHP code, using the global app key.

The encryption happens inside the ORM layer, so our app code doesn’t even need to know about it. It felt so transparent, to me, that it may as well not exist.

Beyond that, and of course the usual high standard of security that app servers are held to to prevent SQL injection or other vulnerabilities, I wasn’t seeing the benefit of encrypting specific colums more. But the good folks of PHP Australia provided me with some great insight into what other threats this feature can mitigate.

Thanks to Andrew, Pat, valorin, James and Samuel for helping me think through this. I appreciate your patient explanations!


It's a 1+n problem, not a n+1 problem

#

This is probably my most trivial tech pet peeve and I want to get it off my chest.

I have been perplexed at the common use of the term “n+1 query problem”, maybe popularised by this page on it in the SQLite docs:

Many Small Queries Are Efficient In SQLite

While the page’s content is interesting, for as long as it has been on the Wayback machine it has used the phrase “N+1 Queries Are Not A Problem With SQLite”:

https://web.archive.org/web/20240000000000*/https://sqlite.org/np1queryprob.html

The problem is typically caused by code that looks something like this:

// select * from posts...
$posts = Post::all();
foreach ($posts as $post) {
    // select * from users where id = ...
    $user = $post->author();
    // use $user...
}

This code will perform a single database query returning n posts, then for each post it will run a query returning one user, for a total of n additional queries. That’s 1 and then n round-trips to and from the database.

In the typical three-tier, remote database server setup that most webapps use, this is inefficient. (Not so when using SQLite, which is the point of their article above.) There are many ways to fix this kind of problem, depending on the framework or ORM you’re using.

Referring to this as an “n+1 problem” annoys me because the problem is that you do one query first, which is fine, then n queries afterwards, which are the problem. This phrasing emphasizes the 1, which isn’t the problem!

I wouldn’t die on this molehill, but I use the phrase “1+n problem” instead and I hope it catches on.


Why there should only be 5 HTTP response codes

#

I’ve developed a strong opinion about exceptions over the last few years. There are only two pieces of information that all exceptions should include:

When the client and the provider are not on the same software team, these are the only two answers that determine how an error can be handled.

Now, errors should include helpful details beyond just the answers to the questions above. For example, if the email input in a form is incorrect, the response must tell the user exactly which field is wrong and why it is wrong. But these details are pretty much always down to humans to interpret.

This leads me to the conclusion that there are only 5 necessary HTTP status codes:

*I’ll allow that the semantics of HTTP mean there probably needs to be more than 1 code for redirects. Though I reckon the differences between temporary and permanent redirects etc. could have been expressed just as well with response headers.

I’ve framed this in terms of HTTP response codes, but the same issues apply to exceptions inside your own code. When throwing an exception, ask yourself the questions: who needs to fix this, and when (if ever) should the client retry?


Lightning talk on JSON:API at PHP Sydney

#

Tonight I gave a lightning talk at the last PHP Sydney Meetup of the year.

The subject was “An Introduction to JSON:API”, which is the spec we use for our APIs at Pylon:

https://jsonapi.org

The slides are online here:

https://static.crabmusket.net/2024/json_api_php_sydney/

And the recording:

https://youtu.be/cc4N2TyXAKI?si=E5ypdUbZUYXspDV6&t=3895

It was the third talk I’d given at the meetup this year. The talk went well, despite me forgetting to bring the HDMI adapter port for my Framework laptop.

A few small things I forgot to say during the talk:

I was both relieved and very slightly disappointed nobody asked me afterwards about JSON Hyper-Schema (I guess they were just respecting my clearly-stated wishes).

On the subject of JSON:API, here are a few articles that have influenced me:


Lessons from the JSON-RPC mailing list

#

I’ve used JSON-RPC before as a simple request-response protocol for Websocket services. It is a simple way to avoid bikeshedding.

I was recently looking into tRPC to compare and contrast, and my research led me to the JSON-RPC mailing list. I discovered a lot of interesting ideas in there which you might not have appreciated if you’ve only used JSON-RPC in a superficial way, e.g. over HTTP to a server, or even over a Websocket like I was.

Often, someone would ask why they had to provide an id, since one HTTP request would always result in one HTTP response, so the id could just always be "1". The replies highlight both the centrality of id to the spec, and the real meaning of JSON-RPC being transport-agnostic:

Ajax-type one-off requests make this hard to see, but the id is vital to the client sorting out its many messages. It has little to do with the server, and probably never should.

https://groups.google.com/g/json-rpc/c/vOFAhPs_Caw/m/bMYHo4rdGHoJ

On the client living longer than the transport:

If the transport is handling the role the id currently fills, then yes… but that also means the client using the transport needs to understand that. If you a client makes 30 calls to 4 different servers, having the ids allows a central tabulation of the current call states. What you are suggesting would require that meaning to be injected or stored in the currently running transport instances. This does not work when you want to either freeze the client state, or have requests that outlive the transport instances.

https://groups.google.com/g/json-rpc/c/vOFAhPs_Caw/m/LiqTE-kTH7kJ

On the over-reliance on HTTP and especially GET:

Exactly. HTTP GET has been an issue with json-rpc all the way through, as HTTP GET is often misused… perhaps a thread specific to addressing this angle of it will lead to some clarity for all of us on how to move forward.

https://groups.google.com/g/json-rpc/c/vOFAhPs_Caw/m/LiqTE-kTH7kJ

You keep linking the request object to the transport instance life time and suggesting optimizations based on that view. json-rpc is used in places where that is not true and there is little gained from the “optimization”.

https://groups.google.com/g/json-rpc/c/vOFAhPs_Caw/m/FLCBaleT0UAJ

On notifications and empty responses,

if the function being called has no meaningful return value, adding an id in the call merely forces the server to add a useless NULL response

The NULL (or “void”) response isn’t useless - it allows the client to verify that the server received and processed the request. In some transports like HTTP this may be duplicative of the HTTP response which comes anyway, but in most other transports it is not.

https://groups.google.com/g/json-rpc/c/vOFAhPs_Caw/m/ozmF478rpTkJ

So not only might we be using a transport like Websockets, which doesn’t have request-response semantics, we might actually make requests that outlive the lifetime of a transport.

An easy example is: what if your Websocket disconnects due to a temporary network outage. If you’re used to an HTTP world, you might expect that all your in-flight requests must just be abandoned and retried. But JSON-RPC allows you to keep ahold of those ids and match up responses you get once you’ve re-established a new Websocket connection.

Going even further: you could build a sneakernet RPC system that doesn’t even rely on the internet. Or one that distributes requests and responses via gossip protocol. This goes back to JSON-RPC 1.0’s peer-to-peer roots.

(Actually doing that would involve a ton of complexity which is entirely outside the JSON-RPC spec. E.g. deciding how to serialise client state so it knows what to do with responses after a restart. I’m not sure why you would use JSON-RPC for that instead of building a sync engine. But you could!)

Another possibility is having asymmetic transports. For example, a browser-based client could send requests via HTTP, but receive responses via a separate Server Sent Events channel. The server could make use of EventSource’s Last-Event-Id header to ensure replies aren’t dropped while the client reconnects.

JSON-RPC is a very simple spec. The id is actually the most important part of it. Without id you might as well just send JSON over HTTP any old way.


Somebody please clone Cloudflare Durable Objects

#

I’ve been looking into how to build a Figma-like realtime multiplayer editor. These two articles go into a lot of detail about the efforts Figma had to expend to achieve it:

https://www.figma.com/blog/rust-in-production-at-figma/

https://www.figma.com/blog/how-figmas-multiplayer-technology-works/

While I was working on a series of prototypes based on the posts above, Cloudflare dropped a new product: Durable Objects, an extension of their Workers platform.

https://blog.cloudflare.com/introducing-workers-durable-objects/

As Paul puts it in this piece about the Durable Objects advantage:

Durable Objects provide a way to reliably connect two clients to the same object. Each object has a (string) name that is used when connecting to it, which typically corresponds to a “room” or “document ID” in the application itself.

In practice, this means that two clients can establish WebSocket connections using the same document ID, and the infrastructure will guarantee that they are connected to the same Durable Object instance.

This is the difference that makes Durable Objects so compelling for multiplayer use cases, but it’s also the reason that the Durable Object abstraction hasn’t been slapped on top of other FaaS platforms – it is a radical departure from core assumptions made by their scheduling and routing systems.

https://digest.browsertech.com/archive/browsertech-digest-cloudflares-durable/

The Durable Objects killer feature is the global uniqueness of each object, and the ability to address the object from a known string.

For example, say I have a “customer” object in my database and I want to associate a realtime session with each customer, e.g. for the purpose of adding realtime updates to the CRM interface for that customer.

A client can send a request to Cloudflare asking:

connect me via Websocket to the Object identified by "session:customer_<id>"

and when two clients make that same request with the same <id> on opposite sides of the world, both their Websockets will end up connected to the same Durable Object instance somewhere in Cloudflare.

I’ve been wishing another cloud provider would build similar functionality, but as Paul notes, it’s not simple. While edge workers (especially JavaScript ones) are fairly commoditised now, building on Durable Objects is still significant vendor lock-in.

Jamsocket deserves an honorable mention for building open-source tooling to self-host something very similar:

https://jamsocket.com/


Thoughts on git commits, branches, PRs and stacking

#

At work, we’ve been trying out Graphite for the past year or so. It’s a complement to our GitHub account which helps us manage stacked PRs.

After trying the “stacking” workflow in the context of our small team, I wrote up some thoughts internally on how we use git. Hopefully, this will help us think more clearly about how to use our version control system to maximum effect. I figured I’d post my reflection publicly, too.

Primitives in version control

Commits

A commit is an atomic change to the codebase. It can touch multiple files, if a single logical change is physically distributed across the filesystem.

Branches

A branch refers to the “tip” of a linked list of commits. Merge commits join two branches into one, so that commits form an acyclic graph.

Pull requests

A pull request is a social construct built atop branches. It is a “request” from one developer to merge one branch into another. (And yes, I prefer GitLab’s “merge request” terminology, but unfortunately it’s the less used term.)

Stacks

Stacks are a collection of related PRs, often with sequential dependency between them. E.g. the PR that implements a frontend feature depends on the PR that implements the backend for the feature. These two PRs could be managed using a stacking tool like Graphite.

The stated purposes of stacking workflows are:

Outcomes

Thinking this through reminded me of the importance of commits themselves. I’d like to try harder to create exemplary commits for my team, which should flow on into better PRs, more helpful git blame, and hopefully a reduced need for stacking tools.


Paradicms, a minimum viable CMS

#

I found out about Paradicms when searching for advice about working with RDF data in JavaScript.

(The name appears to be a combination of “paradigms” and “CMS”.)

The pitch is that they’re building a CMS for digital humanities projects like archives and museums in a minimal computing way.

RDF comes in between the data sources (e.g. a spreadsheet) and the static site generation. Spreadsheets in a conventional format are pulled into a linked data graph, which is then queried to create the data on each page. Critically, this happens at build time, not at page load time.

I like:

I worry about:

I wish RDF were more popular as a concept, with apps providing schemas/grammars for their data export.

I’d like to try out the Paradicms approach on some small projects for local groups I’m part of that aren’t technical. I hope this could be a way to responsibly deliver a small data-backed site or app to a group that aren’t coders.


Relational

#

There seem to be a common misconception about the meaning of the term “relational” (in the context of “relational databases”).

There are a few commonly-used phrases:

One of these things is not like the others.

A classic demonstration of the misunderstanding:

Collectively, multiple tables of data are called relational data because it is the relations, not just the individual datasets, that are important. Relations are always defined between a pair of tables.

https://r4ds.had.co.nz/relational-data.html

The Drizzle ORM also gets this annoyingly wrong, although at least they didn’t bake the word relation into the library’s API surface like Eloquent.

A relation, properly defined, is closely analogous to a table. Relational data is just data stored in tables.

If somebody asks you “if your data is relational”, they may be suffering from this misunderstanding.


Lessons in object-oriented design

#

I will be re-reading the following quotes from Sandi Metz’s Practical Object-Oriented Design every morning before I start coding:

Object-oriented design is about managing dependencies. It is a set of coding techniques that arrange dependencies such that objects can tolerate change. In the absence of design, unmanaged dependencies wreak havoc because objects know too much about one another. Changing one object forces change upon its collaborators, which in turn forces change upon its collaborators, ad infinitum. A seemingly insignificant enhancement can cause damage that radiates outward in overlapping concentric circles, ultimately leaving no code untouched.

This makes sense of why dependency injection containers exist, the principle of inversion of control, and how to split up classes into the right responsibilities.

Asserting that code should be easy to change is akin to stating that children should be polite; the statement is impossible to disagree with, yet it in no way helps a parent raise an agreeable child. The idea of easy is too broad; you need concrete definitions of easiness and specific criteria by which to judge code. If you define easy to change as

  • changes have no unexpected side effects,
  • small changes in requirements require correspondingly small changes in code,
  • existing code is easy to reuse, and
  • the easiest way to make a change is to add code that in itself is easy to change,

then the code you write should have the following qualities. Code should be

  • Transparent The consequences of change should be obvious in the code that is changing and in distant code that relies upon it.
  • Reasonable The cost of any change should be proportional to the benefits the change achieves.
  • Usable Existing code should be usable in new and unexpected contexts.
  • Exemplary The code itself should encourage those who change it to perpetuate these qualities.

Code that is Transparent, Reasonable, Usable, and Exemplary (TRUE) not only meets today’s needs but can also be changed to meet the needs of the future. The first step in creating code that is TRUE is to ensure that each class has a single, well-defined responsibility.

Optimising for unknown future requirements seems like a great way to go about writing code that’s usable today as well.