When is it okay to use short variable names?
MP 74: Naming things is hard, but we can get better at it.
I’ve been doing a lot of refactoring work lately. I expect new people to start contributing to the project I’m working on as it approaches its first stable release; with that in mind, one focus for this refactoring work is making the code more readable to people who aren’t already familiar with the overall codebase.
Recently I was working on a section of code that uses a number of list comprehensions. I rarely use single-letter variable names, and was surprised to find a number of places where single-letter names made the code more readable.1
Allowed modifications
The code I’m working on modifies the user’s project so it’s ready for deployment to a remote host. Here’s the main form of the command that people run when using this project:
$ python manage.py simple_deploy --platform <platform-name>
This command makes configuration changes to the user’s project based on the platform name they specify. Users can then run their platform’s deploy
or push
command, and they should have a working deployment of their project.
Ideally, the user should have a clean git status before running this code; all the changes made to their project should be contained in a single commit. This makes it easy to see what configuration changes were required for the target platform. It also allows the user to easily roll their project back to a clean state if they decide they don’t like any of the changes that were made.
When users run the simple_deploy
command, it runs git status
in the background before making any changes. If the output indicates the presence of uncommitted changes, the project exits with a message asking the user to commit their existing changes and then run the command again.
However, there are some uncommitted changes that are acceptable, which shouldn’t block the code’s execution. For example, the user may have added django-simple-deploy
to their project’s requirements. Or, they may have run the command once and fixed an issue that blocked configuration. Running the command creates a log directory, and that directory is added to .gitignore. We don’t want to block execution based on these kinds of changes.
Examining changed files
One of the simplest ways to check for uncommitted changes is to see which files have been changed. Here’s two lines of code from a function that checks whether it’s okay to proceed with modifying the user’s project:
if any([path.name not in allowed_modifications for path in modified_paths]):
return False
This code looks for any file that’s been modified, that’s unrelated to a simple_deploy
run. If any such files exist the function returns False
, indicating it’s not okay to proceed.
Significant and insignificant names
Let’s look at just the list comprehension in this code:
[path.name not in allowed_modifications for path in modified_paths]
We’re thinking about how to name things, so let’s write down all the names used here:
path
allowed_modifications
path
modified_paths
Two of these names are defined outside the comprehension: allowed_modifications
, and modified_paths
. The other name, path
, is only used inside the comprehension.2
Here’s a version of the comprehension that de-emphasizes the name path
:
[p.name not in allowed_modifications for p in modified_paths]
We don’t often use single-letter variable names, because they lack context. But in a comprehension, all the context is contained in a single line. Using the name p
here emphasizes a few things:
It’s the
name
attribute of the path that we’re focusing on;We’re looking for names in
allowed_modifications
;The paths we’re examining are coming from
modified_paths
.
These are exactly the things that I want to call the reader’s attention to, if they’re unfamiliar with this codebase.
This is especially noticeable if we make the opposite kind of change, to a more verbose set of names:
[modified_path.name not in allowed_modifications for modified_path in modified_paths]
This is a common way to name things if we’re accustomed to using plural names for lists, and then using the singular version of that name in the opening line of a for
loop:
for modified_path in modified_paths:
...
In the context of a full loop, where the first line is less busy than a comprehension, this naming approach works. That’s especially true if the block that follows has any degree of complexity.
Coming back to any()
If it’s not clear what this code does, consider the emphasis shown in this version of the comprehension:
[p.name not in allowed_modifications for p in modified_paths]
The bold expression here will always evaluate to True
or False
. So we’ll end up with a list like this:
[False, False, True, False]
Wrapping any()
around this list returns True
if any of the values in the list are True
, and False
otherwise:
>>> any([False, False, True, False])
True
The original code is a little hard to reason about out of context. If any of the modified files are not in the list of allowed modifications, the function returns False
, indicating it’s not okay to proceed in configuring the user’s project. If none of the modified files are in that list, it will return True
and we can proceed.
Conclusions
Naming things really is hard. I think when people say that, we often think about times where it was hard to come up with a descriptive name for an abstract concept we were working with. But many times there are smaller naming decisions that affect how readable our code is, especially to people who aren’t very familiar with the overall codebase.
When you’re writing a comprehension, consider using short names for the variables that only exist inside the comprehension. They should make sense to people reading your code, and draw attention to the more significant names that exist outside the comprehension itself. If you recognize other situations where a variable is only used in a single line, or in an otherwise isolated context, consider using short names there as well.
Don’t go overboard. For example, single-letter variable names in a standard for
loop will probably make your code less readable.
The subtitle of this post is a play on an old programming joke that most readers have probably heard many times before. If you’re not familiar with this joke, here’s one variation:
There are two hard things in programming: naming things, cache invalidation, and off-by-one errors.
If you haven’t heard this before, I’m happy to be the first to share it with you. :)
Note that name
in path.name
is also used in the comprehension, but it’s not a variable name that we have control over. There’s no meaningful way to change the name of name
.