The Elements of Code

Naming

Names are deeply meaningful to your brain, and misleading names add chaos to your code.

Andrew Hunt, The Pragmatic Programmer: From Journeyman to Master

Rule: Think Carefully About Names

Names should aspire to create the correct semantic map in the reader's mind with as few characters as possible. Names must be as consistent as possible.

Names are jokingly said to be one of the hardest problems in programming. Like most jokes however, the statement isn’t entirely spurious; choosing good names can immediately clarify a problem, as the problem becomes associated with the ideas behind the name. Choosing bad names can force programmers to think about a problem incorrectly, wasting time or even causing bugs.

Given the importance of names and how they can provide an immediate framework of ideas if executed well, they are one of the four pillars of communication (see Chapter 1, “Introduction”).

Semantic Mappings

Let’s look at two examples to demonstrate the power of names. Does this code make a network request?

user = user_manager.get(user_id)

What about this code?

user = user_manager.fetch(user_id)

Of the two, most people believe the latter one is more likely to make a network request. The term get is entirely generic, and simply means to retrieve something with no notion of proximity. However, fetch brings to mind the idea of a dog chasing a stick after it has been thrown, and bringing it back. It is the act of leaving to retrieve something, and then returning with the item. As network requests involve data leaving the system, and usually result in data being brought back in the form of a response, fetch leverages our semantic mapping of behavior.

Semantic mappings are the connections between our understanding of the meaning of language, and our understanding of some synthetic construct. When creating names in code, we want to leverage semantic mappings as much as possible, guiding readers to the most likely interpretation and purpose of the code. If we fail to do so, readers will often apply incorrect assumptions to what the code does. In the best case, this will cost them time and cause frustration; in the worst case, it will cause critical bugs as functions are misused. In the example with “fetch” vs “get”, the reader could reasonably assume “get” retrieves local data, and be surprised when there is unexpected latency or errors while calling the function. Similar issues arise with “set” vs. “store”, “subscribe” vs. “stream”, “create” vs. “open” vs. “connect”, and other similar words.

A Rule of Thumb

One of the best ways to choose a good name is to describe the purpose of the code to someone. During the description, some key words will be used. Those words can be used to create a good name.

For example, take the following description:

So, this function takes a path to some diagnostic data, and based on the directory and file names it parses out a bunch of metadata details about the diagnostic data. It constructs an object with that data and returns it.

They key words are the descriptive ones: path, diagnostic, parses, metadata details, and constructs. We could call the function ParseDiagnosticPathAndConstructMetadataDetails but that is so long that it is difficult to read. Instead, reduce it to only the most critical information: it determines diagnostic metadata from a path.

Perhaps diag_info_from_path. Its still a bit long, and the abbreviation “diag” is used, but it is nicely descriptive. If the function was attached to a DiagInfoBuilder class, it could be simplified to from_path, as the diag_info part is implied by the classname.

The process of describing the thing in plain english and identifying the key words is the most useful tool in creating good names. If the words used to name something are not used in its description, consider updating the name.

Consistency

The most important rule for names is be consistent. A bad name used everywhere must only be learned once; a good name used everywhere except a few places must be learned multiple times. Inconsistent names create higher cognitive load and increase MTC.

Consider the following Python example where the same concept has multiple names.

def process_all(users):
    for person in users:
        person.process()

def main():
    admins = load_records()
    process_all(admins)

The inconsistency in the above example forces the reader to memorize what a variable is called and in what context. If instead the name is consistent, it is much simpler to follow what is happening.

def process_all(admins):
    for admin in admins:
        admin.process()

def main():
    admins = load_admins()
    process_all(admins)

Even if the name was bad and had no semantic mapping (for example, x), being consistent means once we learn what x is, we no longer need to continue learning all the new names it might have.

Additionally, names should match the style that surrounds them. Consistency means that if a convention used in one specific function is different from the rest of the project, it is more important to match the style of that function than the style of the project. The priority is: function, class, module, package, project. It can be helpful to update code to make style conventions consistent across a project, but it should not be done as part of a behavioral change, as that makes it difficult for readers looking at the modifications to determine what is behavioral and what is stylistic. Instead, when updating style, create separate commits (if working in Git) that only contain stylistic changes. Additionally, recognize that there is a tradeoff: non-behavioral changes will impact the project history when examining it through version control.

Names and Access

The Law of Demeter (LoD, discussed in Chapter 10, “Refactoring”) states that code must ask for all of its dependencies directly, instead of requiring a container and then reaching inside it to pull out the actual dependencies.

Part of this has to do with the value of names and access. As LoD forces us to choose names when creating a function signature, we enhance communication: a function signature with logger, auth_service, user_report communicates much more than a function that simply requires dependencies.

def execute(dependencies):
    pass
def execute(logger, auth_service, user_report):
    pass

The first example communicates nothing: all arguments are already dependencies; no reader has a clue what is actually required to run the function.

The second example uses meaningful names. If those names are consistent throughout the application, the requirements for the function are much more obvious. In typed languages, requirements are even more explicit, since the reader knows precisely the type of each argument.

Affixes

Affixes are either prefixes or suffixes, and are used in names to help communicate purpose. Two common affix types are Hungarian notation, and namespaces. Let’s consider each of these in turn.

The Role of Type Safety

Types are one of the best ways programmers have to communicate information. However, the support for types varies wildly between languages.

To help convey important information in the absence of type support, there are two prefix conventions: Systems Hungarian, and Apps Hungarian.

Systems Hungarian

In Systems Hungarian, variable names are prefixed with physical type indicators, such as str_name or int_count, related to how the data is stored physically in memory. In languages with weak type safety, this can be very helpful.

For example, we can use it to communicate the type of argument a particular function needs in code:

def process(int_timeout):
    """
    Process, but timeout if it takes too long
    """

In large codebases, it can be useful to have the physical type information associated with the variable. However, Systems Hungarian should not be used in languages with strong type safety, where the physical type information is already associated with the variable.

For example, in Golang:

func Process(intTimeout int) {}

In such cases, its use is redundant and distracting.

Apps Hungarian

Apps Hungarian affixes variable names with their logical type, or semantic information, meant to indicate the variable’s purpose or intended use.

For example:

def process(timeout_ms: int):
    """
    Process, but timeout if it takes too long
    """

In the above example, we use Apps Hungarian to denote the timeout value is in milliseconds (logical information), and use Python type hinting to indicate it is an int (physical information).

Apps Hungarian is helpful in languages that do not support logical type aliasing. Communicating the intended purpose of a variable through its name helps the reader determine whether it is being properly used, though the programmer must take care to be consistent with the Apps Hungarian prefix that is used.

Some languages (such as Haskell) allow the aliasing of physical types such that new, logical types can be created. Apps Hungarian is unnecessary in those languages as an appropriate logical type can be created instead. This has the benefit of the compiler or runtime being able to enforce consistency. However, even in languages with sufficient type support, creating a new type for every distinct permutation of purpose can lead to code bloat and confusion, and should thus be used judiciously.

Let’s examine the difference between aliasing and Apps Hungarian in the following examples.

In this Python example, we denote whether the string is sanitized or not by including it in the variable name using Apps Hungarian.

raw_input = input("What is your username?")
sanitized_sql_input = sql_sanitize(raw_input)

In the following Go example, we create a new type to indicate if it is sanitized, and then our function parameters can specify whether they expect raw strings, or sanitized strings.

package main

import "fmt"

type RawInput string
type SanitizedSQL string

func Sanitize(input RawInput) SanitizedSQL {
    return SanitizedSQL(input)
}

func main() {
    var input RawInput

    fmt.Scanln(&input)

    username := Sanitize(input)

    fmt.Println(username)
}

Both of these examples are reasonable ways to convey necessary information, and which approach to use is dependent on the problem being solved and the capabilities available.

Let's Get Technical

Go also has type aliases, in the form:

// the "=" indicates an alias declaration
type A = T

Had we used a type alias instead of a type definition, the Go compiler would have allowed all matching types.

In that case, the type alias is purely to improve communication to the reader.

The primary consideration in using Hungarian notation is MTC: does the affix increase or decrease the time it takes for a reader to understand the intent of the code? The answer will not always be the same, so it is important to consider the MTC impact before applying the notation to your code. Having covered Hungarian notation, let’s move on to consider the other affix type: namespaces.

Namespaces

Let's Get Technical

Some programming languages have explicit constructs called "Namespaces". These are often general-purpose ways of isolating variable names. While they may use the term "namespace", it applies more generally to any construct which isolates named variables from each other.

Packages, libraries, modules, classes, and functions are all forms of namespaces, which use lexical scope (refer to the Chapter 3, “State,” discussed previously). Namespaces allow us to structure code hierarchically, ensuring that using a name in one location does not conflict with using that same name in another location.

Code within a namespace should generally not be prefixed with the name of that namespace.

Take, for example, the Python module pathlib:

import pathlib

pathlib_path = pathlib.PathlibPath(".")
  • This is repetitive, forcing the programmer to type more and the program to be longer than necessary, without decreasing MTC.
  • If the namespace is renamed, all its members must correspondingly be updated.
  • Naming like this distracts from the underlying purpose of the code. When we repeat ourselves, it adds noise, and what the reader cares about becomes lost.

Fortunately, this is not the API of pathlib. In fact, very few standard libraries have this sort of redundancy in them.

import pathlib

path = pathlib.Path(".")

Avoiding repetitive names does not just apply at the module level. Repetition may be present in any namespace, even within a function, and should be corrected. The following is an extreme example of this bad practice:

# service.py

class ServiceAPI:
    def service_api_get(self, request):
        service_api_get_response = self._fetch_response(request)
        return service_api_get_response

# main.py
import service

response = service.ServiceAPI().service_api_get(request)

Attempting to read the preceding code aloud helps drive home the point: does it sound silly to repeat the words “service” and “API” so many times? Yes - and therefore it is probably equally silly to write the code this way.

A corrected version may look like this:

# service.py

class API:
    def get(self, request):
        return self._fetch_response(request)

# main.py
import service

response = service.API().get(request)

Note that we are using the term get rather than fetch, despite fetch having the stronger semantic mapping. This is because the context of the term is important. There is an HTTP method GET, and if we are constructing an API client that uses HTTP, we likely want to communicate that we are literally using the HTTP GET method by calling our API method get.

Like with all rules we’ve discussed, on occasion it may make sense to break this one. Sometimes many elements within different modules will share the same general names. In this case, to help avoid confusion across the project, it can make sense to prefix those elements with the related namespace purpose, so that searching across the project can yield more specific results. However, if you find yourself in this situation, it is worthwhile to think about how the code is constructed and whether there are clearer, less general names within the namespaces that could be used.

Conclusion

Coming up with good names can be really difficult, but is always important. Good names can dramatically reduce MTC, and bad names will invariably increase it.

Time spent thinking of a good name is almost always worth it, and often the process is easier than we believe if we approach it systematically. Leverage semantic mappings to communicate intent, and naming will become much easier.

Above all, be consistent. A reader having to learn a bad name is already bad; having to learn a bad name, and then having that name change into an okay name (much to the reader’s inevitable confusion), is even worse.