The Elements of Code

Introduction

It is an old observation that the best writers sometimes disregard the rules of rhetoric.

When they do so, however, the reader will usually find in the sentence some compensating merit, attained at the cost of the violation. Unless he is certain of doing as well, he will probably do best to follow the rules.

After he has learned, by their guidance, to write plain English adequate for everyday uses, let him look, for the secrets of style, to the study of the masters of literature.

Strunk and White, The Elements of Style

Just as The Elements of Style gave millions of schoolchildren a reference for writing solid, acceptable English prose, this book aims to be a reference for programmers looking to write solid, maintainable code. It does this by providing a series of rules to follow, with examples related to their use. It starts with a focus on code construction and organization, followed by naming, documentation, and tests, and finishes with the importance of practice and refactoring.

The virtues outlined by The Elements of Style are also the virtues of good code:

  • Brevity - make individual lines short (this is not about reducing the total number of lines).
  • Clarity - use good names and constructs that can be quickly understood.
  • Flow - minimize branching logic and the need to re-read sections of code.
  • Simplicity - focus on composition to avoid unnecessary abstractions and duplication.
  • Unity - make modules in the codebase that work well together, and keep them orthogonal– that is, modules should not overlap in their roles.

While reading the rules in this book, consider how they tie into those virtues.

In addition, we will introduce a more measurable, vital concept: Mean Time to Comprehension (MTC). MTC is the average amount of time it takes for a programmer familiar with the given language, syntax, libraries, tooling, and structure to understand a particular block of code. Though a precise quantification of MTC may be possible, in most circumstances it would be prohibitively expensive. The goal is not precision; it is to gain good instincts about tradeoffs. Often, understanding a block of code requires understanding additional code elsewhere. In the worst case, this includes knowledge of the entire codebase, all its dependencies, and other systems it calls into. To make even the smallest contribution to such a project, significant time and effort must first be expended to acquire the necessary knowledge.

Programmers assume they can understand a block of code in isolation, and we should strive to make that assumption correct.

The more code a programmer must understand beyond what they are immediately interacting with, the greater the MTC cost.

Our goal as programmers is to make MTC as low as possible while adhering to the specification of the project.

Specifications all follow a similar pattern:

  • The program must accept some form of inputs.
  • The program must process those inputs under some set of resource constraints (CPU, memory, bandwidth, time, etc).
  • The program must only do what is expected of it, and nothing more.
  • The program must generate precise outputs.

Occasionally, adhering to the specification is in conflict with MTC reduction. Most often this is found in high-performance systems, which require magic numbers or specialized CPU instructions.

However, reducing MTC is usually necessary to write programs which adhere rigorously to their specification. We will keep this objective in mind as we review examples throughout this book.

The Elements of Code expects the reader to be familiar with basic programming concepts, but not an expert. Of course, even the most experienced programmer may find benefit in revisiting the fundamentals on occasion. This book is a practical guide to programming, focused on application rather than theory. It contains the rules for writing code, such that you can reference them, and eventually, know when to break them.

A Message in a Bottle

In 1976, Ann Druyan sat down and thought intently about falling in love. She also thought about the history of the Earth, problems that faced the planet, and other emotions. While she thought, her brainwaves were recorded, and then transcribed onto two records.

Those records were called the “Voyager Golden Records” and were launched into space as part of the Voyager mission. They contained more than just brainwaves; they also encoded hours of various languages, natural sounds and music, along with 116 images of humans, wildlife, and scientific diagrams. They were created on the off chance an alien species runs across a Voyager probe in deep space.

Ann was the creative director of the project, and she faced a difficult technical challenge: how do you communicate with a species without sharing a language, culture, or even common experience of the natural world?

The project found their answer in fundamental physics. They used the state transition period of a hydrogen atom as a single clock cycle. From that, they derived mappings to other concepts, including the mechanism for playback of the record and how to decode the images. It was a work of brilliance.

Despite the brilliance of the scientists, the Voyager Golden Records are a message in a bottle, sent to destinations uncharted and peoples unknown. We cannot say who, if anyone, will receive them, or how they will interpret them.

The Voyager team could only communicate as clearly and intentionally as possible, hoping for the best.

In software development, we face a similar, albeit easier, message in a bottle problem: we do not know who will need to read and understand what we were doing, or what our intentions were. We must therefore communicate as effectively and intentionally as possible. Our reader may be us in three weeks, or it may be someone we have never met - and will never meet - ten years from the day we wrote the code. If a programmer attempts to work with your code years after you wrote it, and you happen to be around to answer their questions, it is a wonderfully humbling experience to watch them struggle through your creation.

Working with software requires comprehension of that software, and this means writing it is an exercise in communicating with future readers. Effective communication is difficult, and often programmers mistakenly believe that because their program is successfully understood by the compiler or runtime, it can be successfully understood by other programmers. But code viewed as instructions can only ever communicate behavior, not intent, as it is prescriptive in nature. Our software encompasses much more than simply an instruction set - it is our model of reality, and it embodies our understanding of how to solve the problems at hand. To convey that understanding, we need more than code that merely compiles: we need to communicate complex ideas to future readers.

To reduce the burden of communication, we must also focus on minimizing the complexity of our software. In software engineering, there is “accidental complexity”, in opposition to “inherent complexity.” Inherent complexity is the fundamental difficulty present in solving the problem: some problems are hard. Accidental complexity is the difficulty not inherent in the problem; all the convolution we create in our pursuit of addressing the real concerns of the business.

The rules presented here are an antidote, in part, to accidental complexity.

Pillars of Communication

The four pillars of communication are the available mechanisms to communicate with future readers from within the codebase. They provide different levels of flexibility, which is dictated by the computer: some decisions must conform to strict syntax, some decisions have restricted choices, and other decisions have unbounded options.

Outside the codebase, there are many additional ways of communicating: discussions between the author(s) and those new to the code, recordings of use, diagrams placed in wikis, bug tracking threads, emails and messages, even interactive LLMs with knowledge of the code and supplemental materials. Because those exist outside of the codebase, we will not cover them in this book. Instead, we will focus our attention on the software project— its source code and supporting elements— created during the development process.

As you write software, consider what knowledge you currently possess that should be communicated to future programmers to keep that software stable, extensible, and easy to work with. Use these four pillars to convey that knowledge.

Structure

Structure, the first pillar of communication, is how the code is organized, and includes file system hierarchy, line placement, syntactic choices, polymorphic abstractions, and the like. It is the first level of flexibility: the programmer has many critical choices, but those choices must still be interpreted and executed by the computer. Often, programmers fail to realize the way they structure their code indicates expected ways of interaction; notably, structural patterns are copied when additions are made. Failing to communicate how to extend the code through good structural choices results in exponential complexity growth as the codebase increases in size. This is often seen in projects with conditionals duplicated throughout the code, or large sections of code copy-pasted to achieve some new functionality. Good structure leads to orthogonal additions, which have logarithmic complexity growth. The first several chapters of this book are primarily concerned with this element of communication.

Names

Names are the identifiers given to parts within the code. They are the second level of flexibility: the programmer can choose arbitrary names, but because those names will be used by the computer, they must adhere to some rules. Bad names cause the reader confusion and can lead to misuse, and through that misuse, introduce bugs. On the other hand, good names reduce MTC and allow readers to rapidly understand the intent behind even small sections of code. This pillar is covered in detail in Chapter 7, “Naming.”

Tests

Tests, specifically unit tests, communicate usage. They show precisely how the code can be composed to accomplish a given task. They act as a set of executable, verifiable examples that let readers know how to properly work with the codebase. They are the third level of flexibility: while they must be runnable by a computer, and thus have the same structural and nomenclative constraints as the primary source code, the programmer has freedom to write as many tests as necessary to communicate the appropriate usage. Good tests inspire confidence when making changes, as they guarantee the covered behaviors remain consistent. Additionally, they act as a reference manual, and can be used to rapidly understand specific behaviors without needing comprehensive knowledge of the entire system. In large projects with many developers, it may be difficult (or perhaps impossible) for any single individual to know every aspect of the system, so being able to understand and work with discrete parts is vital. Tests are covered in detail in Chapter 9, “Unit Testing.”

Documentation

Documentation, the last pillar of communication, is the most flexible. It grants the programmer the ability to communicate ideas and intentions in a completely out-of-band manner, without regards to any constraints imposed by the computer. Not all aspects of a system can be communicated through the other three pillars, so documentation acts as an “escape hatch,” a final option for ensuring our intent and knowledge is preserved. As we will see, this is simultaneously terrific and terrifying. This is covered in chapter 8, “Documentation.”

Wrong in Correctable Ways

Every software project carries with it “behavioral models.” These are representations in code of the various ways users can interact with the system, and ways for enhancing the system to enable new interactions. Behavioral models emerge over the course of a software project. Sometimes, these are carefully designed into a specification at the beginning of the project, and then implemented. Sometimes, they are developed as the project evolves and more is understood about the required behaviors. Often, it is a combination of both.

Over a long enough timescale, the behavioral models will always be wrong. This is important to understand and accept. The world is constantly changing and evolving, and as a result the behaviors of our software must also change. We will find bugs that must be fixed, additions that need to be made, and critically, ways in which our model was incomplete or inaccurate. Perhaps we made company management software that assumed a user could only be associated with a single manager, but the organization decided to try out dual reporting lines and wants that to be represented in the system. Perhaps we have a commerce platform where we must calculate total pricing and include taxes, and an additional tariff is levied on certain products but not others, so the system must now accommodate different pricing structures for different goods. Or perhaps it is as simple as the database we were using going end-of-life, necessitating a change to an entirely different storage mechanism.

Following the rules in this book does not guarantee that the behavioral models will be correct for all possible future modifications; no rules could make that guarantee. Instead, the rules provide a path to ensure that most of the time, most of the code will not need to be modified, and that the process to update the model is relatively painless.

This book is not about being correct, it is about being wrong in correctable ways.

Those goals- minimizing the need for modification and making additions painless- are achieved by reducing MTC, effective communication with future readers, and presenting code in a structure that can be easily extended or rearranged. Each chapter in this book will bring us closer to this overall vision.

Chapter 2 is a high level overview of code structure.

Chapters 3, 4, 5, and 6 contain some key concepts in programming, along with tactical guidance for code construction.

Chapters 7, 8, and 9 are about communicating intent via names, documentation, and tests.

Chapter 10 is about putting together these concepts to improve existing code, or to create new code.

The final chapter is about the learning process, practice, and mastery.

The examples throughout the book mostly use Python, but include other languages. This is because the rules are applicable to a wide variety of contexts; readers are encouraged to think about the examples and how they apply to their own language and environment.

The Elements of Code has a strong focus on mechanisms and structure over higher-level decisions such as design and abstractions. As a result, we will go over how to replace bad patterns with better ones, and how to think about code, rather than concepts such as determining a project’s model, architecture, or module design.