A small set of special integers
MP #30: An interesting optimization in the CPython interpreter.
Note: I’ll be starting a new series in June about OOP (object-oriented programming) in Python. If you have any questions about OOP, or topics you’d like to see covered, please feel free to share them here and I’ll make a note to include them in the series. Thanks!
In the last post the idea of mutable and immutable objects played an important role. We saw that integers are immutable values. This is true, but there’s an interesting optimization that Python uses when it comes to certain integers that goes beyond immutability.
Comparing small integers
Consider two variables, each pointing to the same integer value:
>>> x = 5
>>> y = 5
Are these two variables equal?
>>> x == y
True
They certainly are, because the equality operator (==
) compares the values associated with each variable. These values are the same, so the equality operator returns True
.
Let’s take this comparison a step further, and ask if they point to the same object:
>>> x is y
True
The is
keyword in Python compares the ids of two values. This is equivalent to the following code:
>>> id(x) == id(y)
True
This shows that these two variables point to the same memory address. They don’t just have equivalent values; they are different references to the same object.
Comparing larger integers
Let’s try that again, with larger integers this time:
>>> x = 500
>>> y = 500
Are these two values equal?
>>> x == y
True
They are, as they well should be. Again, the equality operator does not check to see that each variable points to the same object. It only checks to see if the two objects have the same value. Both x
and y
point to objects with the value 500, so they are equal.
But are they the same?
>>> x is y
False
They’re not!
Looking at ids
If you’re not already familiar with this behavior it’s worth looking at the ids of each variable, because they help clarify what’s happening:
>>> x = 5
>>> id(x)
4354432016
We make a variable with a small integer as its value. This points to the memory address 4354432016
.
Now let’s make a second variable with this same value, and look at its id:
>>> y = 5
>>> id(y)
4354432016
For small integers, every new reference to the same value points to the same place in memory:
Python does this for efficiency reasons. It turns out that smaller integers are used often enough that it’s worth defining these integers as soon as the Python interpreter starts, and then pointing any refers to those values at these already-defined objects.
Let’s do the same thing with a larger integer value:
>>> x = 500
>>> id(x)
4347568944
We create a variable with the value 500. Python creates an object with that value, and x
points at that object.
If we define a second variable with the same value, Python creates an entirely new object:
>>> y = 500
>>> id(y)
4347568912
For larger integers, Python creates a new object for each new variable. The variables x
and y
now point to separate objects, that happen to have the same value:
Note that the behavior is different if you point the two variables at the same object:
>>> x = 1000
>>> id(x)
4347569360
>>> y = x
>>> id(y)
4347569360
The expression y = x
tells Python to point y
at the same object that x
points to. All integers are immutable objects, so y
isn’t going to track any changes you make to x
. “Changing” x in this case would actually assign it to a new object, leaving y
pointing at whatever object it was originally pointing at.
The critical values
It turns out that integer values from -5 through 256 (inclusive of these two values) are pre-loaded when the Python interpreter starts up. These values are used so often that it’s worth that little bit of startup time to make most programs run more efficiently.
A couple caveats
There are some caveats to all of this. First of all, this is an implementation detail of the CPython interpreter. That means the current implementation of the most common Python interpreter has this optimization. Not all Python interpreters work exactly this way, and the Python language specification does not require this behavior.
Also, you’ll see different behavior if you run this code within a single .py file:
x = 500
y = 500
print(x is y)
Here, the Python interpreter can examine the whole file. It sees that the same immutable value is used twice, and it optimizes execution by pointing both variables at the same object in memory. This code outputs True
.
Conclusions
Python is more than 30 years old at this point, and that’s a lot of time for people to notice small ways to improve the efficiency of the language. Many of these small optimizations are invisible most of the time. Understanding details like this, however, helps you better understand how Python works with objects and memory under the hood. A lot of what you learn by looking at an implementation detail like this clarifies what happens with more complex objects such as tuples, lists, dictionaries, and classes.
If you’re interested in reading more about this topic, look up the phrase “integer caching”.
Great post, thanks for writing! Makes me wonder how they chose the integer caching bounds