OOP in Python, part 5: Class methods
MP 42: Dealing with data that applies to multiple instances.
Note: This is the fifth post in a series about OOP in Python. The previous post discussed static methods. The next post covers the __str__()
and __repr__()
methods.
So far in this series we’ve talked about the __init__()
method and static methods. There’s one more special kind of method you should know about: class methods. Class methods are useful when you want to work with information that’s associated with a class, but not associated with one specific instance.
Bonsai trees
Let’s consider a class that represents Bonsai trees:
class BonsaiTree:
def __init__(self, name, description=""):
self.name = name
self.description = description
def describe_tree(self):
msg = f"{self.name}: {self.description}"
print(msg)
tree = BonsaiTree("Winged Elm")
tree.description = "tall, solid trunk"
tree.describe_tree()
When you create an instance of BonsaiTree
you need to provide a name, and you can provide a description as well.
Here’s the output for a single tree:
Winged Elm: tall, solid trunk
This class is what we typically think of in basic OOP: the class lets us create instances, and the entire focus is on the kinds of things we might need to do with individual instances.
But what if there’s some information we want to work with affecting more than one instance?
More than one tree
Many Bonsai trees are part of a collection, so let’s add a couple more trees:
class BonsaiTree:
...
trees = []
tree = BonsaiTree("Winged Elm")
tree.description = "tall, solid trunk"
trees.append(tree)
tree = BonsaiTree("El Arbol Murcielago")
tree.description = "short, open trunk"
trees.append(tree)
tree = BonsaiTree("Mt. Mitchell")
tree.description = "small mountain forest"
trees.append(tree)
for tree in trees:
tree.describe_tree()
We make three instances of BonsaiTree
, and append each one to the list trees
. Now we have three trees:
Winged Elm: tall, solid trunk
El Arbol Murcielago: short, open trunk
Mt. Mitchell: small mountain forest
Tracking instances
Imagine you’re actually using this class to track specimens in an exhibit. You might want to keep track of how many instances of BonsaiTree
have been created.
The number of trees that have been added to the collection is information that’s relevant to the class, but it’s not associated with any one specific instance. When you have information like this, it should be stored in an attribute that’s associated with the overall class, not any instance.
Here’s how we do that:
class BonsaiTree:
num_trees = 0
@classmethod
def count_trees(cls):
msg = f"We have {cls.num_trees} trees in the collection."
print(msg)
def __init__(self, name, description=""):
self.name = name
self.description = description
BonsaiTree.num_trees += 1
def describe_tree(self):
...
trees = []
tree = BonsaiTree("Winged Elm")
...
for tree in trees:
tree.describe_tree()
BonsaiTree.count_trees()
We first add an attribute called num_trees
. This attribute is added outside of the __init__()
method. Notice that num_trees
doesn’t have a prefix; it’s not called self.num_trees
. An attribute with no prefix is associated with the overall class, and it points to one value. Attributes prefixed with self
are associated with specific instances, and have distinct values associated with each instance.
To write a class method, use the @classmethod
decorator. This decorator passes an argument representing the overall class to the method it decorates. By convention, class methods typically use cls
as the first parameter.1 Inside the method, you can use cls.attribute_name
to access the value of any class attribute such as num_trees
. Here, we use cls.num_trees
to compose a single sentence informing people how many trees have been added to the collection.2
To make use of this, we need to increment the value of num_trees
whenever a new instance of BonsaiTree
is created. We can do that in __init__()
:
def __init__(self, name, description=""):
...
BonsaiTree.num_trees += 1
The __init__()
method doesn’t receive a cls
argument, so it needs to access num_trees
through the name of the class. Now, whenever __init__()
is called as a new instance is being made, num_trees
will be increased by 1
.
Finally, we call this method outside the class using the class name:
BonsaiTree.count_trees()
The output shows how many instances have been created:
Winged Elm: tall, solid trunk
El Arbol Murcielago: short, open trunk
Mt. Mitchell: small mountain forest
We have 3 trees in the collection.
A variety of approaches
The syntax used in the previous listing is the most appropriate for this small example. However, there are a variety of ways to work with class attributes and methods. This gives you a lot of flexibility in how you work with the information in a class, but it also creates some things to watch out for.
Using the name of the class in a class method
A class method automatically gets a reference to the class, but you can also use the name of the class to access class attributes. For example, this would work:
@classmethod
def count_trees(cls):
msg = f"We have {BonsaiTree.num_trees} trees in the collection."
print(msg)
There’s no reason to do this, but it’s good to be aware that this syntax won’t cause an error.
Calling a class method through an instance
You can access class methods through individual instances. For example, this code runs:
tree = BonsaiTree("Winged Elm")
tree.description = "tall, solid trunk"
tree.count_trees()
This will generate the same output as calling BonsaiTree.count_trees()
.
You might need to use this approach if the code you’re working on receives an instance of a class, but the module you’re working in doesn’t have direct access to the overall class. There’s no need to import the class in the module; you can just call the class method through the instance you’ve received.
This kind of flexibility is good, because when people are working with an instance, they shouldn’t have to think about how the class was written. They don’t need to keep track of which methods are instance methods, static methods, or class methods. They just need to know what they can do with objects, and leave the implementation details up to the library maintainers.
Accessing class attributes from within regular methods
You can access class attributes from within regular methods.3 For example we might expand the describe_tree()
method to emphasize that this is one tree in a larger collection:
def describe_tree(self):
msg = f"{self.name}: {self.description}"
msg += f"\n This is one of {BonsaiTree.num_trees} trees."
print(msg)
This is a regular instance method, receiving the self
argument. It needs self
, because it needs access to the current tree’s name and description. But we can also grab the value of num_trees
in this method by using the class name syntax.
The output describes the individual tree, and reports how large the overall collection is as well:
Winged Elm: tall, solid trunk
This is one of 3 trees.
...
This flexibility allows you to grab whatever information you need about an instance, or the overall class, when working inside a method.
Some things to watch out for
All of this flexibility is helpful for modeling complex real-world things, but it can make for some confusing behavior as well. For example, you can sometimes read class variables through self
, but it’s usually not a good idea.
Reading class variables using self
In a regular instance method, we’ve seen that you can read the information from a class variable using the name of the class. However, in our example, this code also works:
def describe_tree(self):
msg = f"{self.name}: {self.description}"
msg += f"\nThis is one of {self.num_trees} trees."
print(msg)
This generates the same output as the previous example, even though num_trees
is being accessed through self
. This works because there’s only one attribute in the entire class called num_trees
.
Writing data through self
creates a distinct instance variable
Here’s where the confusion comes in: when you write to a variable using self
, you create an instance variable if one doesn’t already exist. This version of the class does not do what we want:
class BonsaiTree:
num_trees = 0
@classmethod
def count_trees(cls):
msg = f"We have {cls.num_trees} trees in the collection."
print(msg)
def __init__(self, name, description=""):
self.name = name
self.description = description
self.num_trees += 1
def describe_tree(self):
msg = f"{self.name}: {self.description}"
msg += f"\n This is one of {self.num_trees} trees."
print(msg)
trees = []
tree = BonsaiTree("Winged Elm")
...
for tree in trees:
tree.describe_tree()
BonsaiTree.count_trees()
Here we’re trying to use self.num_trees
when incrementing the counter in __init__()
. But writing to self.num_trees
doesn’t modify the existing class variable num_trees
. Instead, it makes a new instance attribute called num_trees
, attached to self
. Now every instance of BonsaiTree
will have two versions of num_trees
: one that’s associated with the overall class, and one that’s associated with itself.
That sounds confusing, and it is. Here’s the output:
Winged Elm: tall, solid trunk
This is one of 1 trees.
El Arbol Murcielago: short, open trunk
This is one of 1 trees.
Mt. Mitchell: small mountain forest
This is one of 1 trees.
We have 0 trees in the collection.
Every instance’s version of num_trees
is 1, and the overall class attribute is never modified from its initial value of 0.
Rather than trying to explain this in even more detail, there’s a simple takeaway that will help avoid all this confusion. When you’re working with a class variable inside a class, always access it through cls
in a class method, or through the class name in regular methods. And try to avoid having class attributes and instance attributes with the same name whenever possible.4
Real-world example: pathlib.py
If you’ve worked with files in Python at all, you’re probably somewhat familiar with pathlib
. (If you’re still using strings to represent paths, set aside some time to read about pathlib.Path
objects; your work with files and paths will be much nicer.)
The pathlib
library is implemented in a file called pathlib.py. That file contains four class methods. Here’s one of them:
@classmethod
def _parse_path(cls, path):
if not path:
return '', '', []
sep = cls.pathmod.sep
...
return drv, root, parsed
The leading underscore in _parse_path()
indicates that it’s a helper method, used by other pathlib
code to parse paths. One piece of information the method needs access to is sep
, the path separator that’s been identified on the user’s OS. For example, the path separator is a forward slash on macOS and Linux, and a backslash on Windows.
When given a complete path, this method returns the individual parts of the path.
Conclusions
There’s a tremendous amount of flexibility in how you model things with code using OOP. That flexibility can be overwhelming at times, but it’s also what gives OOP the power that it has. If you use class methods where appropriate, your code will do what you need it to and the purpose of each method will be clear.
When the flexibility of OOP feels overwhelming, remember the following points:
If you’re working with data that’s associated with individual instances, write a regular method with
self
as the first argument. If you need to work with a class attribute, access it using the name of the class.If you’re working with data that’s only associated with the overall class, write a class method and access the data through the
cls
argument.If you’re writing a method that doesn’t need any data associated with individual instances or the overall class, write a static method (see MP #41).
When calling class methods, use the name of the class unless you only have access to an existing instance of the class.
And as always, focus on code that works for your current situation, without getting lost in an attempt to write “perfect” code. Pick an approach that works for your use case, start to write tests when things are working, and be ready to refine your architecture as you and your project evolve.
Resources
You can find the code files from this post in the mostly_python GitHub repository.
Just as self could be called potato, there’s nothing special about the name cls
. It’s used because class
is already a keyword. You could use potato
here as well, as long as you were consistent in your usage. But please don’t do that. :)
To clarify, the code shown here counts how many instances have been created. It doesn’t track deletions, and it doesn’t actively count the number of instances in memory.
By “regular methods”, we’re really talking about instance methods. These are standard methods that receive the self
variable as the first argument, and act on instance attributes.
If you want a little more clarification, this is a great example to run through Python Tutor. If you do so, you’ll see four references to num_trees
in the final visualization. One belongs to the overall class, and there is one reference to num_trees
for each of the three instances. All of these are distinct values, pointing to four different places in memory.
For a relative beginner like I am (I am a student...not a professional software engineer), this article is truly a blessing. It took the rather confusing OOP subject of class methods and made it straight forward for me to get to a first gut-level understanding. I also celebrate the real-world example pulled from a professional Python space, succinct conclusions, and the resource section. Thank you, Eric.