Table of Contents
Introduction
Recipe
State
New
Polymorphism
If
Naming
Documentation
Unit Testing
Refactoring
Conclusion
Cheat Sheet
Documentation
Incorrect documentation is often worse than no documentation.
Bertrand Meyer
Documentation in code should focus on how to use the code correctly, or why odd code exists, but avoid detailing exactly what the code is doing.
Documentation embedded in source code is referred to as “comments”. Though comments may not be recognized by the compiler or language runtime, they are an integral part of the code. They are the fourth pillar of communication and are the only free-form communication mechanism available to programmers. Because of that flexibility, they can be both incredibly valuable, and incredibly dangerous.
Doc Blocks
Doc blocks are comments attached to functions, classes, modules, namespaces, or other multi-statement constructs, and are generally multi-line, creating a “block” of text, as seen in the following Python example.
def hello(value):
"""
Prints the given value, prefixed by "hello, "
"""
print(f"hello, {value}")
One of the dangers of doc blocks, or any type of comment for that matter, is the lack of enforced accuracy. By its very nature, code is an accurate representation of what it does. It is possible for names to be misleading, but reading the code will result in understanding precisely, specifically, what its execution will do.
Comments do not behave like this. Nothing ensures the comments are truthful, and in fact time often creates “accuracy drift”, as code is updated while comments are not:
def hello(value):
"""
Prints the given value, prefixed by "hello, "
"""
print(f"Hello, {value}")
Now, if we read the comment but not the code, we expect the output of hello("world!")
to be “hello, world!”, when in fact it would be “Hello, world!”. A subtle difference, but in critical systems accuracy drift creates bugs, confusion, and miscommunication.
Fortunately, there is quite a bit of tooling to help us when it comes to doc blocks. This tooling parses the signature of the identifier (often a function or class), and generates default documentation which is both human readable and partially machine readable. This allows IDEs to highlight when our documentation is suffering from identifiable forms of accuracy drift.
Here is an example:
def add_two(num):
"""
Adds two to the given number.
Args:
num (int): The number to increment by two.
Returns:
int: The incremented value.
"""
return num + 2
In this case, the documented argument and return names, their existence, and their type information can all be verified. IDE tooling can then highlight areas that are out of sync as a gentle reminder to correct the docs.
For Python, Elixir, Lisp, Haskell, Clojure, and some other languages, doc block notation like this is included with the code as a “doc string”. It is attached as a property somewhere, and is then interrogable by the code itself.
For example, if we were to run help(add_two)
within a Python shell, we would see the output of the doc string we attached to the function.
>>> help(add_two)
Help on function add_two in module __main__:
add_two(num)
Adds two to the given number.
Args:
num (int): The number to increment by two.
Returns:
int: The incremented value.
Many languages also have support for “doctests”, where examples in their doc blocks can be introspected, executed, and validated. This further helps prevent accuracy drift.
Inline Comments
Inline comments are comments occurring just before or on the same line as a statement of code, with the intention of explaining that line, and perhaps several of the following lines as well.
In most instances, inline comments should be avoided. Generally, they indicate the code is overly confusing and complicated, and perhaps not well thought-out. Additionally, as there is very little accessible tooling around inline comments, they are even more prone to accuracy drift than doc blocks.
A bad use of inline comments may look something like the following:
def inscrutable(a, b, c):
# Takes the list of lists from a, filtering out the false-y values,
# and then multiplying them by b to assign them
# as the key in a dict for value c
return {(v * b): c for x in filter(lambda a: a, a) for v in x}
Let's Get Technical
Some readers may note that we are mutating the state of the result, despite being advised against this in Chapter 3, "State". This is true, and an important observation. In the tradeoff between mutation and MTC, assuming the mutation is carefully isolated, we should prioritize the reduction of MTC.
Clearly, the MTC of this code is very high, and the comment is being used to try and decrease it.
Instead, we should use the code itself to reduce MTC, by refactoring it. In this case, that means adding more lines of code, where each line prioritizes brevity.
def scrutable(matrix, key_multiple, value):
result = {}
for sublist in matrix:
for v in sublist:
key = v * key_multiple
result[key] = value
return value
The following is another improper use of inline comments, as it describes what the code does rather than why it does it.
def apply_standard_percentage(b):
return b * 0.2 # Gets the standard 20%
In such cases, doc blocks should certainly be used instead. It is downright horrific when those comments start to suffer from accuracy drift, and the programmer reading them doesn’t realize it:
def apply_standard_percentage(b):
return b + 0.3 # Gets the standard 20%
When this happens, it may take hours or even days for the programmer to realize the program is not behaving as expected due to inaccurate comments.
Not all inline comments are bad. They are helpful and necessary when there is critical information out of the programmer’s control that must be communicated. For example, documenting why a particular decision was made when another one seems more reasonable:
def process(api, task):
prepared_task = prepare_task(task)
# There is an active bug in the API lib tracked here:
# <link>
# The workaround is to manually initialize and submit the task.
initialized_task = api._initialize_task(prepared_task)
api._submit_task(initialized_task)
This comment is not about what we are doing, but about why we are doing it. That is where the value of comments shines: when the decision-making reasoning must be expressed. Code (assuming it has been written properly) already tells us “what”, more precisely than we could ever do with human language. However, it lacks information on the motivation and reasoning behind the decisions. That information is “out-of-band”; it exists in the programmer’s brain (hopefully), and can only be expressed through normal human language channels.
When bugs, odd APIs, or unexpected values force us to write surprising code, documenting that fact is crucial, and inline comments are an excellent way to do so.
Another reason to add inline comments is when some piece of code contains inherent complexity, complexity that necessarily exists within the problem, and we cannot write our code differently and make it disappear.
void complex(int a, int b) {
// Assume more code here...
// Swap the values of a and b without using a tmp variable,
// as we have limited memory
a = a ^ b;
b = a ^ b;
a = a ^ b;
// And more code here...
}
Explaining the reasoning for the use of esoteric syntax is incredibly helpful for decreasing MTC.
Conclusion
Documentation is the most powerful out-of-band communication method we have, but can easily be abused. Be meticulous with your docs, and future programmers will thank you. Be frivolous and they will curse you.
If you spend five minutes writing documentation to describe your intent and save a future programmer an hour, it is time well spent. If you spend two minutes updating the documentation in a section of code to ensure it remains accurate, and save a future programmer a day, it is definitely worth it.
If you have ever read code and thought, “What were they thinking?”, well, documentation is your opportunity to tell a future reader the answer.