A small set of special integers

MP #30: An interesting optimization in the CPython interpreter.

May 25, 2023

Note: I’ll be starting a new series in June about OOP (object-oriented programming) in Python. If you have any questions about OOP, or topics you’d like to see covered, please feel free to share them here and I’ll make a note to include them in the series. Thanks!

In the last post the idea of mutable and immutable objects played an important role. We saw that integers are immutable values. This is true, but there’s an interesting optimization that Python uses when it comes to certain integers that goes beyond immutability.

Comparing small integers

Consider two variables, each pointing to the same integer value:

>>> x = 5
>>> y = 5

Are these two variables equal?

>>> x == y
True

They certainly are, because the equality operator (==) compares the values associated with each variable. These values are the same, so the equality operator returns True.

Let’s take this comparison a step further, and ask if they point to the same object:

>>> x is y
True

The is keyword in Python compares the ids of two values. This is equivalent to the following code:

>>> id(x) == id(y)
True

This shows that these two variables point to the same memory address. They don’t just have equivalent values; they are different references to the same object.

Comparing larger integers

Let’s try that again, with larger integers this time:

>>> x = 500
>>> y = 500

Are these two values equal?

>>> x == y
True

They are, as they well should be. Again, the equality operator does not check to see that each variable points to the same object. It only checks to see if the two objects have the same value. Both x and y point to objects with the value 500, so they are equal.

But are they the same?

>>> x is y
False

They’re not!

Looking at ids

If you’re not already familiar with this behavior it’s worth looking at the ids of each variable, because they help clarify what’s happening:

>>> x = 5
>>> id(x)
4354432016

We make a variable with a small integer as its value. This points to the memory address 4354432016.

Now let’s make a second variable with this same value, and look at its id:

>>> y = 5
>>> id(y)
4354432016

For small integers, every new reference to the same value points to the same place in memory:

Diagram showing x and y both pointing to a single value of 5. — For small integers, all references to that value point to the same object.

Python does this for efficiency reasons. It turns out that smaller integers are used often enough that it’s worth defining these integers as soon as the Python interpreter starts, and then pointing any refers to those values at these already-defined objects.

Let’s do the same thing with a larger integer value:

>>> x = 500
>>> id(x)
4347568944

We create a variable with the value 500. Python creates an object with that value, and x points at that object.

If we define a second variable with the same value, Python creates an entirely new object:

>>> y = 500
>>> id(y)
4347568912

For larger integers, Python creates a new object for each new variable. The variables x and y now point to separate objects, that happen to have the same value:

Diagram showing x and y each pointing to separate instances of the value 500. — For larger integers, each new variable defined with that value results in a new object being created.

Note that the behavior is different if you point the two variables at the same object:

>>> x = 1000
>>> id(x)
4347569360
>>> y = x
>>> id(y)
4347569360

The expression y = x tells Python to point y at the same object that x points to. All integers are immutable objects, so y isn’t going to track any changes you make to x. “Changing” x in this case would actually assign it to a new object, leaving y pointing at whatever object it was originally pointing at.

The critical values

It turns out that integer values from -5 through 256 (inclusive of these two values) are pre-loaded when the Python interpreter starts up. These values are used so often that it’s worth that little bit of startup time to make most programs run more efficiently.

A couple caveats

There are some caveats to all of this. First of all, this is an implementation detail of the CPython interpreter. That means the current implementation of the most common Python interpreter has this optimization. Not all Python interpreters work exactly this way, and the Python language specification does not require this behavior.

Also, you’ll see different behavior if you run this code within a single .py file:

x = 500
y = 500
print(x is y)

Here, the Python interpreter can examine the whole file. It sees that the same immutable value is used twice, and it optimizes execution by pointing both variables at the same object in memory. This code outputs True.

Conclusions

Python is more than 30 years old at this point, and that’s a lot of time for people to notice small ways to improve the efficiency of the language. Many of these small optimizations are invisible most of the time. Understanding details like this, however, helps you better understand how Python works with objects and memory under the hood. A lot of what you learn by looking at an implementation detail like this clarifies what happens with more complex objects such as tuples, lists, dictionaries, and classes.

If you’re interested in reading more about this topic, look up the phrase “integer caching”.

Mostly Python

Discussion about this post