Once a site has a set of basic applications in working order, the next step is to add more advanced functionality to complement the existing behavior. This can sometimes be a matter of simply adding more applications, each providing new features for users and employees alike. Other times, there are ways of enhancing your existing applications so they grow new features directly, without a separate application that can stand on its own.
These "meta-applications" or "sub-frameworks" are built with the goal of easily integrating into an existing application, using hooks that are already provided. This book has illustrated many such hooks, and they can be used in combination to great effect. It's often possible to write a tool that performs a lot of tasks but only requires adding a single line of code to an existing application.
One common need among larger companies is surveillance of the company's data. It's important to know who is making changes; if anything goes wrong, there's someone who can be held accountable. This is also useful for smaller businesses, but the need is less urgent with fewer employees who have access to the data in question.
Nearly any organization of any size can benefit from knowing who last changed a model instance, but it's often left to "those other guys" who really need it. It doesn't have to be that way. Combining a few of the tools available in Django, recording the user who last made changes to a model instance doesn't have to be a difficult task.
On the surface, it seems ridiculously simple. A standard way to override what values get saved for a field is to override the model's save()
method and set the value there. Chapter 10 showed how this can be used to calculate one field's value based on the values of other fields. It seems reasonable that this same technique could be used to insert the current user into a ForeignKey
field.
from django.db import models
from django.auth.contrib.models import User
class ImportantModel(models.Model):
data = models.TextField()
user = models.ForeignKey(User, null=True)
def save(self):
self.user = request.user
super(ImportantModel, self).save()
There's just one problem: where does request
come from? Django's models are separated from views, so the request isn't just magically available inside a model method. One possible solution is to pass the user in manually. After all, the request isn't what's really important; it's the user we're after.
def save(self, user):
self.user = user
super(ImportantModel, self).save()
Then, when the time comes for a view to save an ImportantModel
instance, it goes something like this:
def important_view(request, data):
# Form validation and cleaning
obj = ImportantModel(data=form.cleaned_data['data'])
obj.save(request.user)
That would certainly do the trick, but only in views that are specifically written to use this custom save()
method. All existing applications, including Django's own admin application, expect save()
to work without arguments. Writing a model with that kind of a save()
method will mean writing—or at least, modifying—all your own applications to work with it.
The core problem now shows itself: how do we get the current user into a model method without changing how the function gets called?
The most common approach for a long time was to make use of the fact that an incoming request is processed all the way through to outgoing response in a single thread. Regardless of what server environment Django was running in, these simple facts remained: one thread processes one request at a time and one request goes through just one thread. Coupled with Python's own threading
module,1 a solution was born.
The threading
module, which provides tools for working with threaded applications, provides a function called local()
. This function returns a dictionary that can be used as a namespace for a thread. Functions can read to it and write from it, without worrying about interfering with other threads. Each thread gets its own private dictionary, a task managed by Python itself.
This feature was once useful enough that Django itself contains a version of it for compatibility with Python 2.3 (threading.local()
was introduced in Python 2.4). This copy included with Django formed enough justification for many programmers to begin using a thread-local dictionary to store the request, which could then be retrieved by a model's save()
method. Since the request is always available to middleware, we ended up with a middleware module that looked something like this:
___________
try:
from threading import local
except ImportError:
# Fallback for Python 2.3
from django.utils._threading_local import local
thread_namespace = local()
class ThreadLocalMiddleware:
def process_request(self, request):
# Set the current user in the local() dictionary
thread_namespace.user = getattr(request, 'user', None)
def process_response(self, request, response):
# Clear the user now that the request is finished
thread_namespace.user = None
return response
After that, a model method such as save()
just has to import the thread_namespace
variable and read the user out of it.
from django.db import models
from django.auth.contrib.models import User
from thread_local_middleware import thread_namespace
class ImportantModel(models.Model):
data = models.TextField()
user = models.ForeignKey(User, null=True)
def save(self):
self.user = getattr(thread_namespace, 'user', None)
super(ImportantModel, self).save()
Every time a model instance is saved, save()
updates the user
attribute to be the user who is currently logged in at the time. However, there are a few potential pitfalls with this approach.
First, threading.local()
becomes a bit of a dumping ground for data. Nothing actively manages this dictionary, so there's no central place to look to find out what might be in it. Any code can put things in there, without any clear way of indicating that it did so.
Taking that problem a step further, applications don't know what other code is already using threading.local()
, so there's no way to prevent name clashes. If two applications running in the same thread—perhaps a view that calls out to a third-party library as part of its processing—assign data to the same variable name, the later assignment wins. Even if an application actively checks for the presence of a name before assigning to it, there's no reasonable way to know if that application itself had previously assigned the variable or if another application is using the same name.
The worst part about these issues is that they're intermittent. Whether they cause any real problems depends entirely on what applications use threading.local()
and which ones of those are executed within the same thread. Everything could be working fine one day and break the next, because a new third-party library makes use of threading.local()
in a way you didn't anticipate.
Python's module-level global variables are at least safe enough that one module can't accidentally overwrite a global variable in another module—it's possible to do, but not accidentally. Without offering that level of protection, threading.local()
becomes a dangerous tool to rely on without fully understanding how all the code in your site is using it.
Instead of relying on thread locals, another approach is to consider the actual use case. Typically, people turn to thread locals as a way of storing the current user in a table while using the built-in admin application. Custom views make it easy enough to insert the user manually, so it's usually the admin interface that prompts developers to investigate these techniques.
Shortly before Django 1.0 was released, the admin application was overhauled in order to provide more flexibility and features. One of the additions was a result of the desire to access the current user while working in the admin. Now there is a way to override how the admin saves a given model, also providing access to the request object along the way. Remember the definition of ImportantModel
from the previous section:
from django.db import models
from django.auth.contrib.models import User
class ImportantModel(models.Model):
data = models.TextField()
user = models.ForeignKey(User, null=True)
By adding an admin.py
module to the application, it's possible to supply a new method to use when the admin attempts to save an instance of this model.
from django.contrib import admin
from important_app import models
class ImportantModelAdmin(admin.ModelAdmin):
def save_model(self, request, instance, form, change):
instance.user = request.user
super(ImportantModelAdmin, self).save(request, instance, form, change)
admin.site.register(models.ImportantModel, ImportantModelAdmin)
Now, when a user adds a new instance of ImportantModel
or changes an existing instance, the admin will use this method, and the current user will be added to the instance accordingly. There's no need to worry about anonymous users, because the admin is only available to authenticated users.
This works quite well, following the "ideal" approach where the request is simply passed to those methods that need it. The only problem is that it only works for the admin application. Other applications that could benefit from storing the current user on a related model, without having to rewrite existing views, are still left needing another solution.
Another alternative approach is to keep track of the fields that need to contain the current user and update those fields whenever instances of their associated models are changed. This goes back to being a model-based approach, rather than being view-based, but it no longer requires an override of the save()
method.
The first step is to mark a field as needing the current user to be inserted when the model is saved. This task is traditionally handled by save()
or a view, but this approach will use a new type of field to manage it. This new CurrentUserField
will live in the models.py
module of a new current_user
application.
from django.db import models
from django.contrib.auth.models import User
class CurrentUserField(models.ForeignKey):
def __init__(self, **kwargs):
super(CurrentUserField, self).__init__(User, null=True, **kwargs)
It's currently little more than a specialized ForeignKey
that's been hard-coded to relate with Django's built-in User
model. It specifies null=True
to account for applications where the model may be edited by an anonymous user and other non-Web applications that might add or update records. As it stands, it would be usable in that regard alone, by simply replacing an existing ForeignKey
with a new CurrentUserField
. Here is how it looks on the ImportantModel
from the preceding sections:
from django.db import models
from current_user.models import CurrentUserField
class ImportantModel(models.Model):
data = models.TextField()
user = CurrentUserField()
Very little has changed in this incarnation: one field has a new type, and one import was updated accordingly. The notable effect of this simple change is that no additional changes will be necessary on the model after this point. You can go through and add CurrentUserField
to whatever models you like now, knowing that they'll continue to work while we work through the remainder of the code to support them properly.
The next thing to take care of is to keep a record of all the models that have CurrentUserField
instances attached to them. This is important for performance; without it, the user-updating code described later would have to look at every model that gets saved and cycle over its fields, looking for instances of CurrentUserField
. Instead, we can supply a registry of known instances that can speed things up considerably.
A new module, registration.py
, will contain the code necessary to maintain a record of every CurrentUserField
in use and supply information about that registry to other code that asks for it. It uses a slightly modified notion of the Borg pattern,2 looking fairly similar to the plugin architecture registry from Chapter 2.
class FieldRegistry(object):
_registry = {}
def add_field(self, model, field):
reg = self.__class__._registry.setdefault(model, [])
reg.append(field)
def get_fields(self, model):
return self.__class__._registry.get(model, [])
def __contains__(self, model):
return model in self.__class__._registry
The internal _registry
dictionary exists only on the FieldRegistry
class and is never copied out to any instances. All methods operate on that class-level dictionary, so it doesn't matter how many instances of FieldRegistry
get created. All instances will use the same dictionary all the time. Take a look at it in action:
>>> from current_user.registration import FieldRegistry
>>> from important_app.models import ImportantModel
>>> registry = FieldRegistry()
>>> registry.add_field(ImportantModel, ImportantModel._meta.get_field('user'))
>>> registry.get_fields(ImportantModel)
[<current_user.models.CurrentUserField object at 0x...>]
>>> another_registry = FieldRegistry()
>>> ImportantModel in another_registry
True
Note also that this allows for more than one field to be registered on a given model. Since Django allows the same field to be used multiple times on a single model, FieldRegistry
supports that as well. If just one instance of the field was stored in the registry, the last one assigned to the model would overwrite the first, which could cause confusion about what's going on behind the scenes. By explicitly supporting multiple fields per model, we can avoid that problem entirely.
The last step in the registration process is actually adding instances of CurrentUserField
to the registry when they're added to models. Remember from Chapter 3 that fields provide a contribute_to_class()
method that executes while Django processes a model's contents. Overriding that method on CurrentUserField
gives access to the model class as well as the name it was given, but the registry is only interested in the model and field objects.
from django.db import models
from django.contrib.auth.models import User
from current_user import registration
class CurrentUserField(models.ForeignKey):
def __init__(self, **kwargs):
super(CurrentUserField, self).__init__(User, null=True, **kwargs)
def contribute_to_class(self, cls, name):
super(CurrentUserField, self).contribute_to_class(cls, name)
registry = registration.FieldRegistry()
registry.add_field(cls, self)
___________
Now CurrentUserField
can register itself on any model it's attached to, without any additional intervention from you, the developer; simply assigning it to a model will suffice. This registration is the sole purpose of CurrentUserField
, so its job is now done. All the rest of the work happens when a request is processed.
Like the thread-local approach, CurrentUserField
relies on a middleware class to get access to each incoming request and retrieve the current user. Middleware updates the fields without having to write your views specifically to do so, which opens it up for use in all applications, including the admin and other third-party applications where modifying code is problematic.
The real trick here is how to update CurrentUserField
records without resorting to thread locals, and the answer is signals. Since Django provides a pre_save
signal that fires just before an instance gets committed to the database, this new middleware can register a handler to execute at just the right time. These pieces come together in a new middleware.py
module in the current_user
application, starting with the workhorse: the update_users()
method.
from current_user import registration
class CurrentUserMiddleware(object):
def update_users(self, user, sender, instance, **kwargs):
registry = registration.FieldRegistry()
if sender in registry:
for field in registry.get_fields(sender):
setattr(instance, field.name, user)
As a signal handler, it gets two arguments from the pre_save
handler: sender
and instance
; the user
argument will be supplied by the process_request()
method. Since sender
is the model whose instance is currently being saved, it can be used to check whether the model is registered as having any CurrentUserField
attributes. If so, it simply loops over them, setting the instance attribute for each one to the current user.
That won't do anything unless registered for the pre_save
signal, which is a job for the process_request()
method.
from django.db.models import signals
from django.utils.functional import curry
from current_user import registration
class CurrentUserMiddleware(object):
def process_request(self, request):
if hasattr(request, 'user') and request.user.is_authenticated():
user = request.user
else:
user = None
update_users = curry(self.update_users, user)
signals.pre_save.connect(update_users, dispatch_uid=request, weak=False)
def update_users(self, user, sender, instance, **kwargs):
registry = registration.FieldRegistry()
if sender in registry:
for field in registry.get_fields(sender):
setattr(instance, field.name, user)
This method starts out by checking whether the user is authenticated or not. Remember, CurrentUserField
uses null=True
to support anonymous users, so this step is necessary to make that distinction. By also checking to see if the request even has a user
attribute at all, this handles cases where the default AuthenticationMiddleware
is disabled or is placed after CurrentUserMiddleware
in the MIDDLEWARE_CLASSES
setting.
Continuing on, update_users()
is curried into a new function, with the current user preloaded as its first argument. The resulting function is now configured for use as a signal handler. This signal will be registered for every incoming request, since the only way to know the current user is to get it when the request comes in. It must be removed when the request is finished; otherwise, multiple signal handlers would be competing to update the same fields.
Since update_users()
is curried, there won't be a reference for it once process_request()
finishes executing. In order to keep it from being destroyed before it can be useful, it gets registered with weak=False
. Since the middleware doesn't get to keep a reference to the curried function, the dispatch_uid
argument provides an alternative reference for the handler. There will only be one signal handler for each incoming request, so the request object is a suitable unique identifier.
Once the curried update_users()
is then registered on the pre_save
signal, Django continues on with other middleware and executes the view. Any models updated during that time will be checked by update_users()
and updated as necessary. Once the view finishes, Django enters the response phase of middleware processing, where CurrentUserMiddleware
needs to remove the listener, using the request to identify it.
from django.db.models import signals
from django.utils.functional import curry
from current_user import registration
class CurrentUserMiddleware(object):
def process_request(self, request):
if hasattr(request, 'user') and request.user.is_authenticated():
user = request.user
else:
user = None
update_users = curry(self.update_users, user)
signals.pre_save.connect(update_users, dispatch_uid=request, weak=False)
def update_users(self, user, sender, instance, **kwargs):
registry = registration.FieldRegistry()
if sender in registry:
for field in registry.get_fields(sender):
setattr(instance, field.name, user)
def process_response(self, request, response):
signals.pre_save.disconnect(dispatch_uid=request)
return response
As mentioned, CurrentUserMiddleware
will register a signal handler for every request that Django processes, checking for instances of CurrentUserField
every time a model is saved within a request. On most small sites, this additional overhead is hardly noticeable, but high-volume sites may notice a reduction in the quality of the user experience. The benefits of data surveillance aren't worth degrading the experience provided to your users.
One way to keep overhead to a minimum is by restricting it to situations where updates are likely to take place. Chapter 7 explained how the HTTP standard expects certain methods to be "safe"—simply viewing a document shouldn't make any changes. These safe methods are GET, HEAD, OPTIONS and TRACE; process_request()
can be written to special-case these methods, bypassing any further handling.
from django.db.models import signals
from django.utils.functional import curry
from current_user import registration
class CurrentUserMiddleware(object):
def process_request(self, request):
if request.method in ('GET', 'HEAD', 'OPTIONS', 'TRACE'):
# This request shouldn't update anything
,
# so no singal handler should be attached.
return
if hasattr(request, 'user') and request.user.is_authenticated():
user = request.user
else:
user = None
update_users = curry(self.update_users, user)
signals.pre_save.connect(update_users, dispatch_uid=request, weak=False)
def update_users(self, user, sender, instance, **kwargs):
registry = registration.FieldRegistry()
if sender in registry:
for field in registry.get_fields(sender):
setattr(instance, field.name, user)
def process_response(self, request, response):
signals.pre_save.disconnect(dispatch_uid=request)
return response
Even among requests that do modify data, not all views modify the models that are being managed by CurrentUserField
. There's no way for a third-party application like this to programmatically know which models are modified by which views, so the middleware simply looks at all of them. This can be avoided by only applying this middleware on those views that you know modify the affected models.
To achieve this, we turn to django.utils.decorators
, which contains the useful decorator_from_middleware
function that was shown in Chapter 7. This utility function takes a middleware, like our CurrentUserMiddleware
, and converts it into a decorator that can be applied to just those views that need its features. This new decorator can be provided in the middleware module, right alongside the middleware it accesses.
from django.db.models import signals
from django.utils.functional import curry
from django.utils.decorators import decorator_from_middleware
from current_user import registration
class CurrentUserMiddleware(object):
def process_request(self, request):
if request.method in ('GET', 'HEAD', 'OPTIONS', 'TRACE'):
# This request shouldn't update anything,
# so no singal handler should be attached.
return
if hasattr(request, 'user') and request.user.is_authenticated():
user = request.user
else:
user = None
update_users = curry(self.update_users, user)
signals.pre_save.connect(update_users, dispatch_uid=request, weak=False)
def update_users(self, user, sender, instance, **kwargs):
registry = registration.FieldRegistry()
if sender in registry:
for field in registry.get_fields(sender):
setattr(instance, field.name, user)
def process_response(self, request, response):
signals.pre_save.disconnect(dispatch_uid=request)
return response
record_current_user = decorator_from_middleware(CurrentUserMiddleware)
Now it's possible to import this decorator and apply it to the views that modify any of the models with a CurrentUserField
attached. One obvious example is the admin interface, which shouldn't be open to the general public and should therefore have a limited number of users. This does raise one last problem: the decorator produced by decorator_from_middleware
only works on functions, not on callable objects.
The admin site uses an object—typically django.contrib.admin.site.root
—as the view in URL configurations, so the record_current_user
decorator won't work with it directly. Instead, a small wrapper function needs to be placed between the two, which can satisfy the decorator while passing everything through to the admin site object.
from django.conf.urls.defaults import *
from django.contrib import admin
from current_user.middleware import record_current_user
admin.autodiscover()
urlpatterns = patterns('',
(r'^admin/doc/', include('django.contrib.admindocs.urls')),
(r'^admin/(.*)', record_current_user(lambda *args: admin.site.root(*args))),
# The rest of the site gets configured here.
)
If the vast majority of the site's views do update models that have CurrentUserField
attributes—to be expected if you're tracking all the models in your applications—the programmer overhead of having to import and apply the decorator to every view may not be worth it. Since nearly all views would need the decorator, applying the middleware makes more sense in that situation.
Capturing the last user to make a change is useful to a point, but finding out more information requires talking to that user in person and asking what was changed. Worse yet, there's no record of who else changed anything previously, so there's no way to know what path a record took from beginning to end.
By bringing together even more of the techniques listed in this book—dynamic models, custom field-like objects, descriptors and curried functions for a start—we can supply a framework for tracking the changes of any model in any application under your control. This includes who changed the model, when it was changed and what it looked like at the time. In keeping with DRY, it's even possible to add this functionality to a model with a single line.
Managing the history of objects requires a fairly detailed application, and it can be hard to understand the end goal when looking at everything individually. This section provides an overview of the features that will be available when the application is completed, so you can start seeing these features fall into place as the code progresses.
First is the act of assigning a history manager to the model that will be archived. This should be as simple as possible, preferably just a single attribute assignment. Simply pick a name and assign an object, just like Django's own model fields.
from django.db import models
from django.contrib.auth.models import User
import history
class Contact(models.Model):
user = models.OneToOneField(User)
phone_number = models.CharField(max_length=15, blank=True)
address = models.CharField(max_length=255, blank=True)
city = models.CharField(max_length=255, blank=True)
state = models.CharField(max_length=255, blank=True)
zip_code = models.CharField('ZIP code', max_length=10, blank=True)
history = history.HistoricalRecords()
#Descriptors and methods are here (see Chapter 10)
That's enough to get everything configured. From there, the framework is able to set up a model behind the scenes to store old records and the history
attribute can be used to access those records using Django's standard database API methods. Consider a Contact
object for the author of this book:
>>> from contacts.models import Contact
>>> from django.contrib.auth.models import User
>>> author_user = User.objects.get(username='martyalchin')
>>> author = Contact.objects.create(user=author_user, state='MI')
>>> print '%s (%s)' % (author, author._meta.object_name)
Marty Alchin (Contact)
This object will function just as it normally would, but with the addition of the history
attribute, additional information about the author's history is available. To start, one historical record is available from when the Contact
object was first created.
>>> for record in author.history.all():
... print '%s (%s)' % (record, record._meta.object_name)
...
Marty Alchin as of 2008-10- 08 16:09:57 (HistoricalContact)
Note Historical records won't magically exist for data that was already in the database. Only new records and updates will get tracked. If you'd like to make a historical record for each row in your existing database, simply save them all one at a time, using a loop such as [c.save() for c in Contact.objects.all()]
. This may be problematic for large databases, where some custom SQL may be more appropriate.
If the contact changes his phone number, it's necessary to update the Contact record accordingly. That change also shows up as a new historical record.
>>> author.phone_number
u'555-555-5555'
>>> author.phone_number = '517-555-2424'
>>> author.save()
>>> for record in author.history.all():
... print u'%s (%s)' % (record, record.phone_number)
...
Marty Alchin as of 2008-10-08 16:46:51 (517-555-2424)
Marty Alchin as of 2008-10-08 16:09:57 (555-555-5555)
Notice that they're sorted with the most recent record first. This allows for some simple methods to be added, making it easier to get older copies. For instance, the history manager also has a most_recent()
method, which returns a Contact
object with its attributes set to those found in the most recent historical record.
>>> recent = author.history.most_recent()
>>> print '%s (%s)' % (recent, recent.phone_number)
Marty Alchin (517-555-2424)
Even though each historical record is a different model than the original, a true Contact
object is available from any HistoricalContact
by using the history_object
attribute.
>>> record = author.history.all()[0]
>>> print record
Marty Alchin as of 2008-10-08 16:46:51
>>> print type(record)
<class 'contacts.models.HistoricalContact'>
>>> print record.history_object
Marty Alchin
>>> print type(record.history_object)
<class 'contacts.models.Contact'>
In the event that a specific date is known, the historical manager also includes a shortcut function to return a Contact
object containing the values that were true for the given object as of the date specified.
>>> import datetime
# 'then' is just minutes before the last update
>>> then = datetime.datetime(2008, 10, 8, 16, 45)
>>> old_contact = author.history.as_of(then)
>>> print '%s (%s)' % (old_contact, old_contact.phone_number)
Marty Alchin (555-555-5555)
Even after a contact has been deleted, a record of it still remains. In fact, a new record is added to indicate when the contact was deleted. Given the original ID, an empty Contact
object can be used to retrieve historical records, including the most_recent()
method for a kind of "undo" functionality.
>>> author.delete()
>>> author = Contact(pk=1) # Note: not retrieved from the database
>>> old_contact = author.history.most_recent()
>>> print '%s (%s)' % (old_contact, old_contact.phone_number)
Marty Alchin (517-555-2424)
Each historical record is identified as one of three types: created, changed or deleted.
>>> for record in author.history.all():
... print u' %s %s (%s)' % (record.history_type, record,
... record.get_history_type_display())
...
− Marty Alchin as of 2008-10-08 17:19:13 (Deleted)
~ Marty Alchin as of 2008-10-08 16:46:51 (Changed)
+ Marty Alchin as of 2008-10-08 16:09:57 (Created)
The whole registration of a history manager begins by assigning a HistoricalRecords
object to a model, so that's a good place to start defining code. This will live in the models.py
module of a new history
application. There are a number of things that have to happen in sequence to get the history system initialized for a particular model; at a high level, HistoricalRecords
manages all of the following tasks:
HistoricalRecords
was assigned. This manager will then access historical information.That's a short list, but each step requires a fair amount of code, combining several of the techniques described throughout this book. Before any of those steps can really begin, there's a small amount of housekeeping that must be done. Since the HistoricalRecords
object gets assigned as an attribute of a model, the first chance it gets to execute is in the contribute_to_class()
method.
class HistoricalRecords(object):
def contribute_to_class(self, cls, name):
self.manager_name = name
So far it's not much, but this is the only point in the process where Django tells HistoricalRecords
what name it was given when assigned to the model. This is stored away for future reference during Step 3.
In order to store the data from a model instance in a historical record that can be easily added, searched and retrieved, we need a new model behind the scenes. In theory, we could use any structure that can contain data; perhaps a single TextField
that contains pickled objects. But to search and browse the historical data more easily, it makes sense to use the same data structure as the original model itself.
Chapter 3 showed that a model's _meta
attribute contains all the information about how that model was defined, including all of its fields in the order they were declared. This information is crucial, because it allows us to create a new model that matches that same data structure. The only trouble is that contribute_to_class()
gets called on each field in turn, in the order they appear in the namespace dictionary Python created for the model's definition. Since standard dictionaries don't have a guaranteed order, there's no way to predict how many fields will already have been processed by the time HistoricalRecords
gets a chance to peek at the model.
To solve this, we turn to a signal: class_prepared
. Django fires this signal once all the fields and managers have been added to the model and everything is in place to be used by external code. That's when HistoricalRecords
will have guaranteed access to all the fields, including the order in which they were defined, so contribute_to_class()
continues by setting up a listener for class_prepared
.
from django.db import models
class HistoricalRecords(object):
def contribute_to_class(self, cls, name):
self.manager_name = name
models.signals.class_prepared.connect(self.finalize, sender=cls)
Django will now call HistoricalRecords.finalize()
with the fully- prepared model once everything is in place to continue processing it. That method is then responsible for performing all of the remaining tasks, all the way through Step 3. Most of the details are delegated to other methods, but finalize()
coordinates them.
The first thing finalize()
needs to do is copy the original model to create a new model with extra fields attached. It defers this task to the create_history_model()
method, which in turn relies on a few other methods.
import copy
import datetime
from django.db import models
from current_user import models as current_user
class HistoricalRecords(object):
def contribute_to_class(self, cls, name):
self.manager_name = name
models.signals.class_prepared.connect(self.finalize, sender=cls)
def finalize(self, sender, **kwargs):
history_model = self.create_history_model(sender)
def create_history_model(self, model):
"""
Creates a historical model to associate with the model provided.
"""
attrs = self.copy_fields(model)
attrs.update(self.get_extra_fields(model))
attrs.update(Meta=type('Meta', (), self.get_meta_options(model)))
name = 'Historical%s' % model._meta.object_name
return type(name, (models.Model,), attrs)
There are a few different sub-steps required in creating a model like this. Adding all the logic in one method would hamper readability and maintainability, so it's been broken up into three additional methods.
The copy_fields()
method is tasked with copying the existing fields on the model, returning a dictionary with new fields that can be applied to the history model. This is a more complicated task than it may sound, because there are a few special cases that need to be accounted for, but Python provides a tool to help with the common case.
Python's copy
module3 is designed to copy an object and all of its attributes into a new object. This operation is necessary in the event that any of these field attributes get changed; changing one field shouldn't affect another. If we simply assign the existing field to the new model, they would be the same object, sharing a namespace. A change to one would affect the other, which isn't a good thing. The copy.copy()
function takes care of our needs.
After copying each field, copy_fields()
has to take care of two special cases. The first is that a model can only ever contain one AutoField
attribute, and it must be the primary key. Django provides an AutoField
as the primary key for any model that doesn't explicitly declare a different primary key, so this is a common case. Since there will likely be multiple historical records for a given ID, the history model has a separate AutoField
for its primary key. Any existing AutoField
instances that are found on the original model must be changed to a standard IntegerField
on the history model.
The next special case to take care of is that uniqueness can no longer be guaranteed on any field. Both the unique
and primary_key
arguments imply that a field's value must be unique across all rows in the model, which won't be true in a historical context. Any field found with either of these attributes set to True
is changed to False
, with db_index
set to True
instead. Having a unique field on the original model implies some importance, so adding an index to it will help speed up queries that rely on that field's content.
___________
In addition to the fields themselves, every model needs an attribute named __module__
that Django can use to determine what application it belongs to. Since this history model will be tied to the original model, __module__
can be copied straight over to the new model along with its fields. This way, copy_fields()
provides everything necessary to make the history model function like the original model as much as possible.
def copy_fields(self, model):
"""
Creates copies of the model's original fields, returning
a dictionary mapping field name to copied field object.
"""
# Though not strictly a field, this attribute
# is required for a model to function properly.
fields = {'__module__': model.__module__}
for field in model._meta.fields:
field = copy.copy(field)
if isinstance(field, models.AutoField):
# The historical model gets its own AutoField, so any
# existing one must be replaced with an IntegerField.
field.__class__ = models.IntegerField
if field.primary_key or field.unique:
# Unique fields can no longer be guaranteed unique,
# but they should still be indexed for faster lookups.
field.primary_key = False
field._unique = False
field.db_index = True
fields[field.name] = field
return fields
So far, the history model is only set up to store the same values as the original model. It does keep a historical record of each stage an instance went through, but without anything else, it's of little real-world value. It needs some extra information along with that data to make it useful. There are a few basic components that are useful in nearly all situations.
CurrentUserField
explained earlier in this chapter.+
' for a new instance, '~
' for an update to an existing instance and '−
' for a deleted instance.Most models also include a __unicode__()
method that controls how a model instance will be displayed when printed to a console or written to a string, such as a template. To preserve this while still indicating its historical status, get_extra_fields()
provides a new __unicode__()
method that simply uses the original method and adds a date to the end of the string. This is done with the help of a special history_object
attribute, which will be explained in the next section.
def get_extra_fields(self, model):
"""
Returns a dictionary of fields that will be added to the historical
record model, in addition to the ones returned by copy_fields below.
"""
rel_nm = '_%s_history' % model._meta.object_name.lower()
return {
'history_id': models.AutoField(primary_key=True),
'history_date': models.DateTimeField(default=datetime.datetime.now),
'history_user': current_user.CurrentUserField(related_name=rel_nm),
'history_type': models.CharField(max_length=1, choices=(
('+', 'Created'),
('~', 'Changed'),
('− ', 'Deleted'),
)),
'history_object': HistoricalObjectDescriptor(model),
'__unicode__': lambda self: u'%s as of %s' % (self.history_object,
self.history_date)
}
One advantage of providing this extra information in a separate method is that get_extra_fields()
offers a chance for customization. Many projects have some additional information, such as a SITE_ID
, that could be logged alongside this information to give greater insight into the data. Overriding get_extra_fields()
provides an opportunity to easily add those extra fields.
from django.conf import settings
from django.db import models
class SiteHistoricalRecords(HistoricalRecords):
def get_extra_fields(self, model):
fields = super(SiteHistoricalRecords, self).get_extra_fields(model)
fields.update({
'history_site': models.IntegerField(default=settings.SITE_ID),
})
return fields
In addition, overriding get_extra_fields()
allows other types of customizations. If the provided field names or the __unicode__()
implementation don't suit your taste, feel free to replace them. This makes get_extra_fields()
the method for the more flexible aspects of the history model.
Since the history model is designed for storing and retrieving information about what a model instance looked like at points in the past, it makes sense to have access to an instance of the original model with the historical values. This provides access to any custom methods or other attributes that didn't get copied over to the history model. This is especially necessary for implementing a proper __unicode__()
representation.
Since the historical record of an instance contains all of the field values of the instance itself, no additional database calls are necessary to populate an instance of the original model. This is accessible on a historical instance through the history_object
attribute, which is implemented as a descriptor.
class HistoricalObjectDescriptor(object):
def __init__(self, model):
self.model = model
def __get__(self, instance, owner):
values = (getattr(instance, f.attname) for f in self.model._meta.fields)
return self.model(*values)
It needs to take the original model as an argument and use that instead of the owner
argument to the __get__()
method because owner
is the history model, not the original model. Using the original model's collection of fields as a guide, the descriptor pulls the appropriate values from the instance and creates a new instance. This new instance has all the original methods, including save()
, which can be used to restore an older copy of an instance.
Another necessary item for creating the history model is a dictionary of options to be included as the model's Meta
inner class. The only option that is actually required is ordering
, which makes sure that the records are sorted in descending order by date.
def get_meta_options(self, model):
"""
Returns a dictionary of fields that will be added to
the Meta inner class of the historical record model.
"""
return {
'ordering': ('-history_date',),
}
Other implementations can override this method to add more options as well, if necessary. With nearly everything in place, all that's left is for create_history_model()
to create a new name for the history model and pass everything to type()
to create it. Then, finalize()
can use that new model to perform additional tasks.
There are two ways of modifying a model instance, and Django provides signals to hook into both of them; the post_save
and post_delete
signals are fired when saving and deleting an instance, respectively.
def finalize(self, sender, **kwargs):
history_model = self.create_history_model(sender)
# The HistoricalRecords object will be discarded
,
# so the signal handlers can't use weak references.
models.signals.post_save.connect(self.post_save, sender=sender
,
weak=False)
models.signals.post_delete.connect(self.post_delete, sender=sender,
weak=False)
The HistoricalRecords
object isn't used for anything after its initial setup, so it's discarded by Python's garbage collection fairly quickly. Because the signal handlers are methods of that object and signals use weak references by default, the handlers get removed from the signal as soon as HistoricalRecords
goes away. Passing weak=False
forces the signals to use strong references for these methods, keeping them alive long enough to do their jobs.
Like most of this system, the actual implementations of these two signal handlers each delegate to a separate method to reuse code. They both perform the same task, adding an entry to the history model, so it makes sense to share code as well. The only difference between the two is the value each provides for the history_type
field of the historical record.
def post_save(self, instance, created, **kwargs):
self.create_historical_record(instance, created and '+' or '~')
def post_delete(self, instance, **kwargs):
self.create_historical_record(instance, '- ')
def create_historical_record(self, instance, type):
manager = getattr(instance, self.manager_name)
attrs = {}
for field in instance._meta.fields:
attrs[field.attname] = getattr(instance, field.attname)
manager.create(history_type=type, **attrs)
The manager used to create this entry in the history model is determined according to the manager_name
attribute that was set aside when contribute_to_class()
was called at the beginning of the process. The manager assigned there is the last step of the process.
In order to access the historical records for a given model, a manager is attached to the original model using the name where the HistoricalRecords
object was assigned. The object that gets assigned is actually a descriptor, which creates a customized manager when accessed. All the manager code is located in a new module, manager.py
, which is referenced from models.py
as follows:
import copy
import datetime
from django.db import models
from current_user import models as current_user
import history.manager
class HistoricalRecords(object):
def contribute_to_class(self, cls, name):
self.manager_name = name
models.signals.class_prepared.connect(self.finalize, sender=cls)
def finalize(self, sender, **kwargs):
history_model = self.create_history_model(sender)
# The HistoricalRecords object will be discarded,
# so the signal handlers can't use weak references.
models.signals.post_save.connect(self.post_save, sender=sender,
weak=False)
models.signals.post_delete.connect(self.post_delete, sender=sender,
weak=False)
descriptor = history.manager.HistoryDescriptor(history_model)
setattr(sender, self.manager_name, descriptor)
# Additional methods described in previous sections
The addition of those lines completes the code necessary in models.py
; the remainder of the functionality is implemented in manager.py
instead. The descriptor that gets assigned to the original model is fairly simple, storing the history model and using that to create customized managers. The HistoryManager
then accepts the history model and instance and stores them for later. Note that instance
is an optional argument, allowing the manager to be used on the original model itself. This, in turn, will retrieve all historical records for that model, regardless of what instance they are attached to.
Notice also that the model
attribute is received and stored, but not used by any of this code. Django's own Manager
class uses self.model
to determine what model it should reference in database queries. Assigning the right model to self.model
is all that's necessary to tell Django how to get data from the correct table and formulate results using the appropriate instances.
from django.db import models
class HistoryDescriptor(object):
def __init__(self, model):
self.model = model
def __get__(self, instance, owner):
if instance is None:
return HistoryManager(self.model)
return HistoryManager(self.model, instance)
class HistoryManager(models.Manager):
def __init__(self, model, instance=None):
super(HistoryManager, self).__init__()
self.model = model
self.instance = instance
def get_query_set(self):
if self.instance is None:
return super(HistoryManager, self).get_query_set()
filter = {self.instance._meta.pk.name: self.instance.pk}
return super(HistoryManager, self).get_query_set().filter(**filter)
This overridden get_query_set()
method is what allows a HistoryManager
to retrieve objects matching the ID of a given instance of the original model. No special ordering is required because the history model's Meta
inner class already has an ordering
option set. In addition to simply retrieving a list of related historical records, HistoryManager
can contain extra methods to perform more specific searches.
In the event that a model instance has changed since the last time it was saved or was even deleted previously, it becomes necessary to quickly and easily retrieve the last known state of the instance. Since that information is stored in a history model, which in turn is accessible by the HistoryManager
, a new manager method can do this job without requiring any arguments at all. The first requirement is that this method should not be available on the model itself, only on instances.
def most_recent(self):
"""
Returns the most recent copy of the instance available in the history.
"""
if not self.instance:
raise TypeError("Can't use most_recent() without a %s instance." %
self.instance._meta.object_name)
Now that we can be sure there is a valid model instance to work with, the next step is to gather up the field names that exist on the model, so that only those fields are retrieved. This method returns an instance of the original model, not the history model. Retrieving any additional fields would not only be wasteful, it would also require more code to remove them before populating the model instance, since those extra fields aren't supported by that model.
def most_recent(self):
"""
Returns the most recent copy of the instance available in the history.
"""
if not self.instance:
raise TypeError("Can't use most_recent() without a %s instance." %
self.instance._meta.object_name)
fields = (field.name for field in self.instance._meta.fields)
Note that this needs to use the _meta
attribute of self.instance
, rather than self.model
because self.model
is the history model, not the original model that we're keeping track of. With a list of fields in place, a simple call to values_list()
retrieves the values for all recorded states for the given instance. Because those states are sorted descending by date, the first row is always the most recent, so using an index of 0
will issue the appropriate query.
def most_recent(self):
"""
Returns the most recent copy of the instance available in the history.
"""
if not self.instance:
raise TypeError("Can't use most_recent() without a %s instance." %
self.instance._meta.object_name)
fields = (field.name for field in self.instance._meta.fields)
try:
values = self.values_list(*fields)[0]
except IndexError:
raise self.instance.DoesNotExist("%s has no historical record." %
self.instance._meta.object_name)
Catching IndexError
allows for a more useful error message in the event that there is no history data available for the given instance. In this case, a different exception is raised, using the model's own DoesNotExist
class to keep in line with the way Django's own instance lookups work. If no error is raised, values
will have all the values necessary to populate an instance of the original model. It then does exactly this and returns it for other code to use.
def most_recent(self):
"""
Returns the most recent copy of the instance available in the history.
"""
if not self.instance:
raise TypeError("Can't use most_recent() without a %s instance." %
self.instance._meta.object_name)
fields = (field.name for field in self.instance._meta.fields)
try:
values = self.values_list(*fields)[0]
except IndexError:
raise self.instance.DoesNotExist("%s has no historical record." %
self.instance._meta.object_name)
return self.instance.__class__(*values)
Similar to most_recent()
, it's also sometimes useful to see what a model instance looked like on some specific date or at a particular time. This is useful, for instance, when customers ask about products or resources that they heard about some time ago. Being able to retrieve the item as it existed on the date in question can be a valuable tool in serving those customers' needs. Like most_recent()
, the new as_of()
method starts by making sure it only gets used on a model instance, rather than the model itself.
def as_of(self, date):
"""
Returns an instance of the original model with all the attributes set
according to what was present on the object on the date provided.
"""
if not self.instance:
raise TypeError("Can't use as_of() without a %s instance." %
self.instance._meta.object_name)
The list of fields is retrieved the same way as in most_recent()
, but it's not passed directly into a values_list()
query. The as_of()
query needs to limit its results to the data that was accurate on the date supplied, so we must first apply a filter()
to satisfy that condition.
def as_of(self, date):
"""
Returns an instance of the original model with all the attributes set
according to what was present on the object on the date provided.
"""
if not self.instance:
raise TypeError("Can't use as_of() without a %s instance." %
self.instance._meta.object_name)
fields = (field.name for field in self.instance._meta.fields)
qs = self.filter(history_date__lte=date)
This new QuerySet is what we need to retrieve the instance values for the particular date. Again, a values_list()
query is used and limited to the first result to obtain the record nearest to the date provided. If no records are found, the same DoesNotExist
exception is raised, but with a slightly different message to indicate that there may be records for the object, but none before the date specified.
def as_of(self, date):
"""
Returns an instance of the original model with all the attributes set
according to what was present on the object on the date provided.
"""
if not self.instance:
raise TypeError("Can't use as_of() without a %s instance." %
self.instance._meta.object_name)
fields = (field.name for field in self.instance._meta.fields)
qs = self.filter(history_date__lte=date)
try:
values = qs.values_list('history_type', *fields)[0]
except IndexError:
raise self.instance.DoesNotExist("%s had not yet been created." %
self.instance._meta.object_name)
Note also that the values_list()
query used here includes an extra field not present in the most_recent()
query. One last check must be performed on the data before it's used to populate an instance of the model, and the history_type
is necessary for that. If the row returned from the query has a history_type
of "-"
, that means the instance was deleted prior to the date specified, so it technically didn't exist as of that date. Rather than return an object that didn't exist, as_of()
raises DoesNotExist
, explaining what happened.
def as_of(self, date):
"""
Returns an instance of the original model with all the attributes set
according to what was present on the object on the date provided.
"""
if not self.instance:
raise TypeError("Can't use as_of() without a %s instance." %
self.instance._meta.object_name)
fields = (field.name for field in self.instance._meta.fields)
qs = self.filter(history_date__lte=date)
try:
values = qs.values_list('history_type', *fields)[0]
except IndexError:
raise self.instance.DoesNotExist("%s had not yet been created." %
self.instance._meta.object_name)
if values[0] == '-':
raise self.instance.DoesNotExist("%s had already been deleted." %
self.instance._meta.object_name)
With all the sanity checks completed, we can be certain that the values retrieved are valid for an object that existed on the date passed to the method. Since the first value retrieved in the QuerySet was the history_type
, which isn't part of the original model, a slice is taken to retrieve the rest of the values, which are then passed to the model instead.
def as_of(self, date):
"""
Returns an instance of the original model with all the attributes set
according to what was present on the object on the date provided.
"""
if not self.instance:
raise TypeError("Can't use as_of() without a %s instance." %
self.instance._meta.object_name)
fields = (field.name for field in self.instance._meta.fields)
qs = self.filter(history_date__lte=date)
try:
values = qs.values_list('history_type', *fields)[0]
except IndexError:
raise self.instance.DoesNotExist("%s had not yet been created." %
self.instance._meta.object_name)
if values[0] == '- ':
raise self.instance.DoesNotExist("%s had already been deleted." %
self.instance._meta.object_name)
return self.instance.__class__(*values[1:])
The tools and techniques discussed in this book go well beyond the official Django documentation, but there's still a lot left unexplored. There are plenty of other innovative ways to use Django and Python; the rest is up to you.
As you work your way through your own applications, be sure to consider giving back to the Django community. The framework is available because others decided to distribute it freely; by doing the same, you can help even more people uncover more possibilities. The Appendix explains how you can give back to the community and to the framework itself.