Programming as Theory Building

Posted on Apr 21, 2021

I read this paper by Peter Naur recently which argues that programming is primarily an activity concerned with building up theories.

…what has to be built by the programmer is a theory of how certain affairs of the world will be handled by, or supported by, a computer program.

The true cost of programming is not the merely writing code, that is relatively straightforward. The true cost of programming is building up, maintaining and communicating theories about both the problem and solution. The higher the quality of code, the better the correlation between the theory of the problem and the theory of the solution1.

What characterizes intellectual activity, over and beyond activity that is merely intelligent, is the person’s building and having a theory, where theory is understood as the knowledge a person must have in order not only to do certain things intelligently, but also to explain them, to answer queries about them, to argue about them, and so forth.

Thinking of programming in this way, of building up and manipulating theories with code as a secondary output, can give an interesting lens on activities such as the difficulty of passing over an existing program to a new team (they have to build up theories from scratch) or making modifications.

The expectations that program modifications at low cost ought to be possible is one that calls for closer analysis. First, it should be noted that such an expectation cannot be supported by analogy with modifications of other complicated man-made constructions. Where modifications are occasionally put into action, for example in the case of buildings, they are well known to be expensive and in fact complete demolition of the existing building followed by new construction is often found to be the preferable economically. Second, the expectation of the possibility of low cost program modifications conceivable finds support in the fact that a program is a text held in a medium allowing for easy editing. For this support to be valid it must clearly be assumed that the dominating cot is one of text manipulation.

This framing really resonates with me as I think it elegantly captures the essence of what makes certain programming tasks difficult. For example, editing code in a legacy system. When the people who have built the system are long gone, the theories that they had about the problem and their solution are also gone, or at least imperfectly communicated2. This makes changes difficult, because you don’t have a way to conceptually manipulate the theory and see how your change would fit into it. Similarly, I am sure we have all felt the pain of trying to write code for a new problem before we have built up adequately good theories about it.

Building that theory first will allow the secondary part of programming, manipulating some text so the theory of the solution matches the theory of the problem, to flow more easily.

  1. This actually aligns pretty closely with the original definition of technical debt by Ward Cunningham, which is a mis-match between your understanding of the problem and the solution. It is explained well by Gorge Fairbanks↩︎

  2. This also feels like it reflects the results of the Microsoft research that I wrote about recently. Higher quality code was written by smaller, more cohesive teams with less transient membership. I wonder how much this is because you have fewer cases of lost theories. ↩︎