OOP in Python, part 8: Comparing objects
MP 47: Overriding Python's built-in comparison methods.
Note: This is the eighth post in a series about OOP in Python. The previous post discussed the __new__()
method. The next post discusses helper methods.
When working with instances of a class, programmers often make comparisons between objects. If you have two instances of the same class, are they “equal”? What criteria is used to determine if they’re equal or not? Is one object “greater than” or “less than” another?
These kinds of comparisons are straightforward for numerical values, but you can define your own rules for comparing objects of any class. You can then tie those rules to the built-in comparison operators: ==
, >
, <
, >=
, and <=
. You can define custom behavior for a number of other symbols as well, such as +
, /
, and more.
In this post we’ll look at the built-in methods that implement each of these operations, and write custom methods that add support for some of these symbols to our own classes.
“Greater than” is really a method
Let’s first look at how Python actually implements simple numerical comparisons. Consider this code:
>>> x = 5
>>> y = 3
>>> x > y
True
This is nice syntax to work with, because it matches the format most of us learned to use in early math classes. Most programming languages support this kind of symbol-based approach to comparing numerical values.
You might have heard this famous saying about Python:
In Python, everything is an object.
This means that everything in Python is implemented as a class. If that’s the case, then it might be tempting to conclude that every action is defined by a method. That’s almost true, except that some actions are defined by functions that exist outside of any class.1
The operation that’s run when we use the >
symbol is implemented as a method, called __gt__()
. We can invoke it directly:
>>> x.__gt__(y)
True
This isn’t nearly as clear as the more familiar x > y
syntax, which is why you don’t see it very often.
All of the comparison operators are implemented as methods with similar names:
x.__eq__(y) x == y
x.__gt__(y) x > y
x.__lt__(y) x < y
x.__ge__(y) x >= y
x.__le__(y) x <= y
The important thing to note is that we can work with these same methods for any class that we write.
The River
class
Let’s write our own custom class, and then define ways to compare two instances of the class.
Here’s a class that represents a river:
class River:
def __init__(self, name, length=0):
self.name = name
# Length in km.
self.length = length
This class only has two attributes, a name and a length. Let’s use this class in a terminal, where we don’t have to deal with a bunch of print()
calls to see the output.
We’ll make two River
objects, and see what happens if we try to compare them:
>>> from river import River
>>> yukon = River("Yukon", 3190)
>>> kuskokwim = River("Kuskokwim", 1130)
>>> yukon > kuskokwim
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: '>' not supported between instances of 'River' and 'River'
We define two rivers, yukon
and kuskokwim
. When we try to compare them with >
, we get a TypeError
. That’s to be expected, because Python has no idea what basis to use when deciding if one river is “greater” than another.
Comparing rivers
We can implement support for the >
symbol by writing our own __gt__()
method.
Let’s use the length of each river when making this kind of comparison:
class River:
def __init__(self, name, length=0):
...
def __gt__(self, river_2):
return self.length > river_2.length
Here we’re defining how to make a comparison between two rivers using the >
operator. Python should compare the length of each river, and return the resulting Boolean value. (If you’re running this code, make sure to do so in a new terminal session.)
Now the comparison works:
>>> yukon > kuskokwim
True
>>> kuskokwim > yukon
False
We can support as many comparison operators as we want. The following methods add in support for using the less than (<
) and equality (==
) operators as well:
class River:
def __init__(self, name, length=0):
...
def __gt__(self, river_2):
return self.length > river_2.length
def __lt__(self, river_2):
return self.length < river_2.length
def __eq__(self, river_2):
return self.length == river_2.length
Now we can make three different kinds of comparisons between River
objects:
>>> yukon > kuskokwim
True
>>> yukon < kuskokwim
False
>>> yukon == kuskokwim
False
Even more built-in methods
Almost every operation you can carry out with a symbol in Python has a corresponding method that can be overridden. For example the common mathematical operations are backed by methods, just as the comparison operators are.
Let’s look at addition between two numbers:
>>> x = 5
>>> y = 3
>>> x + y
8
>>> x.__add__(y)
8
We almost always use the +
symbol when adding numbers in Python, but __add__()
does the same thing.
Here are some of the numerical operations you’ve probably seen before, and the corresponding method for each one:
x + y x.__add__(y)
x - y x.__sub__(y)
x * y x.__mul__(y)
x / y x.__truediv__(y)
x // y x.__floordiv__(y)
Note that there are several kinds of division in Python, as there are in most programming languages. True division is the division most people learn in school, which keeps as many decimal places in the result as the system can handle. With floor division, the decimal part of the result is dropped.2
Real-world example: combining paths
If a programming language allows you to override the behavior associated with a symbol, we say that the language supports operator overloading. That is, each operator can be loaded with more than one meaning. The action associated with a symbol then depends on the context in which it’s used.
The pathlib
module contains a nice real-world example of operator overloading, implemented by overriding one of the built-in methods. pathlib
is a newer feature of Python, which makes working with file paths much easier and more intuitive, with a cleaner syntax as well.
One of the nicest features of pathlib
is the ease with which we can combine paths. Consider the following snippet, where we define a root path to a projects/
directory and then two separate paths to more specific directories:
from pathlib import Path
projects_root = Path("/Users/eric/projects")
mp_dir = projects_root / "mostly_python"
dsd_dir = projects_root / "django-simple-deploy"
This is really nice syntax, which mirrors how we write paths on Linux and macOS systems.3
The use of the forward slash symbol to join paths is implemented by overriding __truediv__()
in the pathlib
codebase. Here’s the relevant part of pathlib.py:
class PurePath:
...
def __truediv__(self, key):
try:
return self.joinpath(key)
except TypeError:
return NotImplemented
Whenever a user inserts the /
symbol between two Path
objects, or between a Path
object and a string, this custom __truediv__()
method is called. This method then calls another method, self.joinpath()
, which combines the parts of the path in a way that’s appropriate for the OS where the code is being executed.
Conclusions
In programming, symbols are valuable resources. They’re shorthand for common operations, and they can be used for many purposes. Rather than hard-coding the use of symbols for very specific purposes, Python has a set of methods associated with most symbols that you can override when writing your own classes. This gives you tremendous flexibility in developing user-friendly actions between instances of your classes.
While it’s true that you can override most symbols to do just about anything you want, it’s good to think about what associations users might intuitively make between any given symbol, and the context of your class. Ask yourself these kinds of questions when figuring out actions to associate with different symbols:
What kinds of comparisons might users want to make between instances of this class? Implement those comparisons using the operators
>
,<
,>=
,<=
, and==
.What might it mean to combine instances of this class? Consider implementing that by overriding methods like
__add__()
.What do these symbols already mean in the context you’re working in? If a symbol has a context-specific meaning that users are already familiar with, consider implementing that behavior by overriding any of the methods discussed here.
If you do this thoughtfully you can develop a clean, intuitive interface for users when they’re working with instances of the classes you write.
For a thorough understanding of all the available methods you can override when writing your own classes, take a moment to skim through the entire Data model documentation page. You certainly don’t need to read all of it, but skimming the entire page will help you know what’s possible when implementing new behaviors in a class, or cleaning up the syntax for working with existing classes.
Resources
You can find the code files from this post in the mostly_python GitHub repository.
You can tell which actions are methods and which are functions by how they’re invoked. For example converting a string to uppercase is implemented by a method, because we use the dot notation when making the conversion:
>>> greeting = "hello"
>>> greeting.upper()
'HELLO'
Determining the length of a string is implemented by a function, because we call it without attaching the function name to an object:
>>> len(greeting)
5
There’s also __mod__()
, which returns the remainder of a division operation. The method __divmod__()
returns the quotient and remainder of a division operation.
Here’s what this same code used to look like, before pathlib
was developed:
import os
projects_root = "/Users/eric/projects"
mp_dir = os.path.join(projects_root, "mostly_python")
dsd_dir = os.path.join(projects_root, "django-simple-deploy")
This worked, but it was much less readable than the syntax that pathlib
introduced.
Also, Windows users may be annoyed to use a Linux and macOS convention. But the backslash has long been used as an escape character, which often causes problems when working with strings and paths.
If you’re a Windows user, you should still use forward slashes to write paths in your Python code. pathlib
will convert these to backslashes appropriately when your code is executed. This is one of the abstractions that makes Python a cross-platform language.