Musing on the reality of abstraction in software

We talk about “abstractions” a lot in software development.

Abstraction can sometimes mean “a way to hide details that aren’t relevant to helping me achieve my goals”. This is what we mean by “abstracting away” an implementation.

Sometimes, per Dijkstra, an abstraction might mean “not to be vague, but to create a new semantic level in which one can be absolutely precise”. This tantalising idea feels like doing real intellectual work, the kind involved in discovering things like twos-complement integers or Prolog.

Sometimes (and this is what I want to write about), abstraction means “an entity which isn’t tangible, but is nevertheless real and significant”. This is the “abstract noun” sense of the word.

This is the sense David Deutsch used in a chapter of The Beginning of Infinity titled “The Reality of Abstractions”. He writes:

In regard to what is or is not real, this leads to the requirement that, if an entity is referred to by our best explanation in the relevant field, we must regard it as really existing.

He intends here to counter the reductionist idea in science that the most low-level or fundamental phenomena (like quantum mechanics) are the only things that are “real” or significant. He gives this example, comparing “high-level” explanations with a hypothetical reductionist explanation:

Consider one particular copper atom at the tip of the nose of the statue of Sir Winston Churchill that stands in Parliament Square in London. Let me try to explain why that copper atom is there. It is because Churchill served as prime minister in the House of Commons nearby; and because his ideas and leadership contributed to the Allied victory in the Second World War; and because it is customary to honour such people by putting up statues of them; and because bronze, a traditional material for such statues, contains copper, and so on. Thus we explain a low-level physical observation - the presence of a copper atom at a particular location - through extremely high-level theories about emergent phenomena such as ideas, leadership, war and tradition.

Presumably a reductive ‘theory of everything’ would in principle make a low-level prediction of the probability that such a statue will exist, given the condition of (say) the solar system at some earlier date. … But such descriptions and predictions would explain nothing. They would merely describe the trajectory that each copper atom followed from the copper mine, through the smelter and the sculptor’s studio and so on … In fact such a prediction would have to refer to atoms all over the planet, engaged in the complex motion we call the Second World War, among other things.

This might seem obvious? The reductionist approach to science isn’t one we encounter in our daily lives, and addressing it isn’t usually necessary.

But I keep thinking about the idea that “if an entity is referred to by our best explanation … we must regard it as really existing” and how it crops up when doing software design.

I don’t yet know exactly how to turn this vague idea into something graspable. It just keeps feeling relevant.

This often comes up when trying to understand a new part of our business domain at work.

A concept that initially sounded like a simple relationship between two entities turns out to have a whole life of its own, full of complexity we previously elided.
Two concepts that initially seemed similar turn out to have different underlying needs, and diverge.
Database tables need careful thought to walk the line between “what do we know for sure?” and “what do we need to leave room for in the future?”

It also influences how I think about so-called “natural keys” in databases. Natiral keys encourage thinking of a database row as transparent, nothing more than a reflection of some external, more fundamental reality. On the contrary, oir “best explanation” of software behaviour often included the existence (or absence) of a specific row in a database. It deserves its own identity.

In some education about data-oriented design, a “data reductionism” can arise which sounds a lot like this:

Data is all we have. Data is what we need to transform in order to create a user experience. Data is what we load when we open a document. Data is the graphics on the screen, the pulses from the buttons on your gamepad, the cause of your speakers producing waves in the air … Abstraction heavy paradigms try to pretend the computer and its data do not exist at every turn, abstracting away the idea that there are bytes, or CPU pipelines, or other hardware features, and instead bringing the model of the problem into the program.

Data Oriented Design by Richard Fabian

It’s necessary to climb down off the top of the object-oriented ladder of abstraction sometimes. But I think the right outcome is to realise that what is needed is not no abstractions, but different ones.

Abstract notions like “quaternion” still exist, even if you are handling a packed table of floats instead of a tangled graph of pointers. As the problem of performance moved from minimising CPU cycles to maximising cache utilisation, the correct abstraction to address a concrete need may have changed. But retreating to “it’s all bytes!” is as insufficient as saying “what a peculiar arrangement of copper atoms you have there”.

Filed under: coding