On booleans

In database design especially, and in software development more generally, it’s tempting to store a boolean as a “yes/no” piece of state. But in my experience, rarely is this a good idea.

Often a piece of information is only a boolean on the most cursory of inspections. Often it is really:

A binary enumeration that only coincidentally has two possible values. For example, filter functions traditionally use booleans to keep/reject elements. However, why not use a more descriptive type?

function filter<T>(
    array: T[],
    // Unclear what `true` and `false` mean:
    predicate: (item: T) => boolean,
) { ... }

function filter2<T>(
    array: T[],
    // Crystal clear
    predicate: (item: T) => "keep" | "drop",
) { ... }

An enumeration that actually has more than two values if you think about it. For example, “which side of the line AB is point X on?” could return a boolean, but that hides the important case that the point is exactly on the line. A function like this could more properly return "left" | "right" | "on-the-line", or 0 | 1 | -1 if we’re being terse.

Another case is that the enumeration currently has two values, but may plausibly include more values in the future.

A timestamp when something happened. deleted_at is the classic example for soft-deleting database entries, which provides strictly more information than an is_deleted boolean.

A range of values, often numeric. In an example from work, it’s typical to refer to a solar panel as “bifacial” if it can produce power from light shining on either side of its surface. In reality, “bifaciality” is a factor that describes how much back-facing light the panel can absorb. However, it’s often handy to use a boolean to quickly check “does this panel type have any bifaciality at all?”

The way I think about all these examples is: a boolean is often the answer to a question about the state, not the state itself. Each of these cases can have true-false questions asked about it (e.g. “should we keep item 3 in the array?” or “is point P on the left side of line AB?” or “is this database row deleted?”).

But the answer to the question never contains the whole truth which is captured by the richer data type.