Now, we'll go into more detail as to when we would want to do that.
When we have a built-in container object that we want to add functionality to, we have two options. We can either create a new object, which holds that container as an attribute (composition), or we can subclass the built-in object and add or adapt methods on it to do what we want (inheritance).
Composition is usually the best alternative if all we want to do is use the container to store some objects using that container's features. That way, it's easy to pass that data structure into other methods and they will know how to interact with it. But we need to use inheritance if we want to change the way the container actually works. For example, if we want to ensure every item in a list
is a string with exactly five characters, we need to extend list
and override the append()
method to raise an exception for invalid input. We'd also minimally have to override __setitem__(self, index, value)
, a special method on lists that is called whenever we use the x[index] = "value"
syntax, and the extend()
method.
Yes, lists are objects. All that special non-object-oriented looking syntax we've been looking at for accessing lists or dictionary keys, looping over containers, and similar tasks is actually "syntactic sugar" that maps to an object-oriented paradigm underneath. We might ask the Python designers why they did this. Isn't object-oriented programming always better? That question is easy to answer. In the following hypothetical examples, which is easier to read, as a programmer? Which requires less typing?
c = a + b c = a.add(b) l[0] = 5 l.setitem(0, 5) d[key] = value d.setitem(key, value) for x in alist: #do something with x it = alist.iterator() while it.has_next(): x = it.next() #do something with x
The highlighted sections show what object-oriented code might look like (in practice, these methods actually exist as special double-underscore methods on associated objects). Python programmers agree that the non-object-oriented syntax is easier both to read and to write. Yet all of the preceding Python syntaxes map to object-oriented methods underneath the hood. These methods have special names (with double-underscores before and after) to remind us that there is a better syntax out there. However, it gives us the means to override these behaviors. For example, we can make a special integer that always returns 0
when we add two of them together:
class SillyInt(int):
def __add__(self, num):
return 0
This is an extremely bizarre thing to do, granted, but it perfectly illustrates these object-oriented principles in action:
>>> a = SillyInt(1) >>> b = SillyInt(2) >>> a + b 0
The awesome thing about the __add__
method is that we can add it to any class we write, and if we use the +
operator on instances of that class, it will be called. This is how string, tuple, and list concatenation works, for example.
This is true of all the special methods. If we want to use x in myobj
syntax for a custom-defined object, we can implement __contains__
. If we want to use myobj[i] = value
syntax, we supply a __setitem__
method and if we want to use something = myobj[i]
, we implement __getitem__
.
There are 33 of these special methods on the list
class. We can use the dir
function to see all of them:
>>> dir(list) ['__add__', '__class__', '__contains__', '__delattr__','__delitem__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort'
Further, if we desire additional information on how any of these methods works, we can use the help
function:
>>> help(list.__add__) Help on wrapper_descriptor: __add__(self, value, /) Return self+value.
The plus operator on lists concatenates two lists. We don't have room to discuss all of the available special functions in this module, but you are now able to explore all this functionality with dir
and help
. The official online Python reference (https://docs.python.org/3/) has plenty of useful information as well. Focus, especially, on the abstract base classes discussed in the collections
module.
So, to get back to the earlier point about when we would want to use composition versus inheritance: if we need to somehow change any of the methods on the class—including the special methods—we definitely need to use inheritance. If we used composition, we could write methods that do the validation or alterations and ask the caller to use those methods, but there is nothing stopping them from accessing the property directly. They could insert an item into our list that does not have five characters, and that might confuse other methods in the list.
Often, the need to extend a built-in data type is an indication that we're using the wrong sort of data type. It is not always the case, but if we are looking to extend a built-in, we should carefully consider whether or not a different data structure would be more suitable.
For example, consider what it takes to create a dictionary that remembers the order in which keys were inserted. One way to do this is to keep an ordered list of keys that is stored in a specially derived subclass of dict
. Then we can override the methods keys
, values
, __iter__
, and items
to return everything in order. Of course, we'll also have to override __setitem__
and setdefault
to keep our list up to date. There are likely to be a few other methods in the output of dir(dict)
that need overriding to keep the list and dictionary consistent (clear
and __delitem__
come to mind, to track when items are removed), but we won't worry about them for this example.
So we'll be extending dict
and adding a list of ordered keys. Trivial enough, but where do we create the actual list? We could include it in the __init__
method, which would work just fine, but we have no guarantees that any subclass will call that initializer. Remember the __new__
method we discussed in Chapter 2, Objects in Python? I said it was generally only useful in very special cases. This is one of those special cases. We know __new__
will be called exactly once, and we can create a list on the new instance that will always be available to our class. With that in mind, here is our entire sorted dictionary:
from collections import KeysView, ItemsView, ValuesView class DictSorted(dict): def __new__(*args, **kwargs): new_dict = dict.__new__(*args, **kwargs) new_dict.ordered_keys = [] return new_dict def __setitem__(self, key, value): '''self[key] = value syntax''' if key not in self.ordered_keys: self.ordered_keys.append(key) super().__setitem__(key, value) def setdefault(self, key, value): if key not in self.ordered_keys: self.ordered_keys.append(key) return super().setdefault(key, value) def keys(self): return KeysView(self) def values(self): return ValuesView(self) def items(self): return ItemsView(self) def __iter__(self): '''for x in self syntax''' return self.ordered_keys.__iter__()
The __new__
method creates a new dictionary and then puts an empty list on that object. We don't override __init__
, as the default implementation works (actually, this is only true if we initialize an empty DictSorted
object, which is standard behavior. If we want to support other variations of the dict
constructor, which accept dictionaries or lists of tuples, we'd need to fix __init__
to also update our ordered_keys
list). The two methods for setting items are very similar; they both update the list of keys, but only if the item hasn't been added before. We don't want duplicates in the list, but we can't use a set here; it's unordered!
The keys
, items
, and values
methods all return views onto the dictionary. The collections library provides three read-only View
objects onto the dictionary; they use the __iter__
method to loop over the keys, and then use __getitem__
(which we didn't need to override) to retrieve the values. So, we only need to define our custom __iter__
method to make these three views work. You would think the superclass would create these views properly using polymorphism, but if we don't override these three methods, they don't return properly ordered views.
Finally, the __iter__
method is the really special one; it ensures that if we loop over the dictionary's keys (using for
...in
syntax), it will return the values in the correct order. It does this by returning the __iter__
of the ordered_keys
list, which returns the same iterator object that would be used if we used for
...in
on the list instead. Since ordered_keys
is a list of all available keys (due to the way we overrode other methods), this is the correct iterator object for the dictionary as well.
Let's look at a few of these methods in action, compared to a normal dictionary:
>>> ds = DictSorted() >>> d = {} >>> ds['a'] = 1 >>> ds['b'] = 2 >>> ds.setdefault('c', 3) 3 >>> d['a'] = 1 >>> d['b'] = 2 >>> d.setdefault('c', 3) 3 >>> for k,v in ds.items(): ... print(k,v) ... a 1 b 2 c 3 >>> for k,v in d.items(): ... print(k,v) ... a 1 c 3 b 2
Ah, our dictionary is sorted and the normal dictionary is not. Hurray!
If you wanted to use this class in production, you'd have to override several other special methods to ensure the keys are up to date in all cases. However, you don't need to do this; the functionality this class provides is already available in Python, using the OrderedDict
object in the collections
module. Try importing the class from collections
, and use help(OrderedDict)
to find out more about it.