CHAPTER 11
Enhancing Applications

Once a site has a set of basic applications in working order, the next step is to add more advanced functionality to complement the existing behavior. This can sometimes be a matter of simply adding more applications, each providing new features for users and employees alike. Other times, there are ways of enhancing your existing applications so they grow new features directly, without a separate application that can stand on its own.

These "meta-applications" or "sub-frameworks" are built with the goal of easily integrating into an existing application, using hooks that are already provided. This book has illustrated many such hooks, and they can be used in combination to great effect. It's often possible to write a tool that performs a lot of tasks but only requires adding a single line of code to an existing application.

Recording the Current User

One common need among larger companies is surveillance of the company's data. It's important to know who is making changes; if anything goes wrong, there's someone who can be held accountable. This is also useful for smaller businesses, but the need is less urgent with fewer employees who have access to the data in question.

Nearly any organization of any size can benefit from knowing who last changed a model instance, but it's often left to "those other guys" who really need it. It doesn't have to be that way. Combining a few of the tools available in Django, recording the user who last made changes to a model instance doesn't have to be a difficult task.

On the surface, it seems ridiculously simple. A standard way to override what values get saved for a field is to override the model's save() method and set the value there. Chapter 10 showed how this can be used to calculate one field's value based on the values of other fields. It seems reasonable that this same technique could be used to insert the current user into a ForeignKey field.

from django.db import models
from django.auth.contrib.models import User

class ImportantModel(models.Model):
    data = models.TextField()
    user = models.ForeignKey(User, null=True)
    def save(self):
        self.user = request.user
        super(ImportantModel, self).save()

There's just one problem: where does request come from? Django's models are separated from views, so the request isn't just magically available inside a model method. One possible solution is to pass the user in manually. After all, the request isn't what's really important; it's the user we're after.

   def save(self, user):
       self.user = user
       super(ImportantModel, self).save()

Then, when the time comes for a view to save an ImportantModel instance, it goes something like this:

def important_view(request, data):

    # Form validation and cleaning

    obj = ImportantModel(data=form.cleaned_data['data'])
    obj.save(request.user)

That would certainly do the trick, but only in views that are specifically written to use this custom save() method. All existing applications, including Django's own admin application, expect save() to work without arguments. Writing a model with that kind of a save() method will mean writing—or at least, modifying—all your own applications to work with it.

The core problem now shows itself: how do we get the current user into a model method without changing how the function gets called?

The Thread-Local Approach—Useful but Dangerous

The most common approach for a long time was to make use of the fact that an incoming request is processed all the way through to outgoing response in a single thread. Regardless of what server environment Django was running in, these simple facts remained: one thread processes one request at a time and one request goes through just one thread. Coupled with Python's own threading module,1 a solution was born.

The threading module, which provides tools for working with threaded applications, provides a function called local(). This function returns a dictionary that can be used as a namespace for a thread. Functions can read to it and write from it, without worrying about interfering with other threads. Each thread gets its own private dictionary, a task managed by Python itself.

This feature was once useful enough that Django itself contains a version of it for compatibility with Python 2.3 (threading.local() was introduced in Python 2.4). This copy included with Django formed enough justification for many programmers to begin using a thread-local dictionary to store the request, which could then be retrieved by a model's save() method. Since the request is always available to middleware, we ended up with a middleware module that looked something like this:

___________

try:
    from threading import local
except ImportError:
    # Fallback for Python 2.3
    from django.utils._threading_local import local

thread_namespace = local()

class ThreadLocalMiddleware:
    def process_request(self, request):
        # Set the current user in the local() dictionary
        thread_namespace.user = getattr(request, 'user', None)

    def process_response(self, request, response):
        # Clear the user now that the request is finished
        thread_namespace.user = None
        return response

After that, a model method such as save() just has to import the thread_namespace variable and read the user out of it.

from django.db import models
from django.auth.contrib.models import User

from thread_local_middleware import thread_namespace

class ImportantModel(models.Model):
    data = models.TextField()
    user = models.ForeignKey(User, null=True)

    def save(self):
        self.user = getattr(thread_namespace, 'user', None)
        super(ImportantModel, self).save()

Every time a model instance is saved, save() updates the user attribute to be the user who is currently logged in at the time. However, there are a few potential pitfalls with this approach.

First, threading.local() becomes a bit of a dumping ground for data. Nothing actively manages this dictionary, so there's no central place to look to find out what might be in it. Any code can put things in there, without any clear way of indicating that it did so.

Taking that problem a step further, applications don't know what other code is already using threading.local(), so there's no way to prevent name clashes. If two applications running in the same thread—perhaps a view that calls out to a third-party library as part of its processing—assign data to the same variable name, the later assignment wins. Even if an application actively checks for the presence of a name before assigning to it, there's no reasonable way to know if that application itself had previously assigned the variable or if another application is using the same name.

The worst part about these issues is that they're intermittent. Whether they cause any real problems depends entirely on what applications use threading.local() and which ones of those are executed within the same thread. Everything could be working fine one day and break the next, because a new third-party library makes use of threading.local() in a way you didn't anticipate.

Python's module-level global variables are at least safe enough that one module can't accidentally overwrite a global variable in another module—it's possible to do, but not accidentally. Without offering that level of protection, threading.local() becomes a dangerous tool to rely on without fully understanding how all the code in your site is using it.

The Admin Approach

Instead of relying on thread locals, another approach is to consider the actual use case. Typically, people turn to thread locals as a way of storing the current user in a table while using the built-in admin application. Custom views make it easy enough to insert the user manually, so it's usually the admin interface that prompts developers to investigate these techniques.

Shortly before Django 1.0 was released, the admin application was overhauled in order to provide more flexibility and features. One of the additions was a result of the desire to access the current user while working in the admin. Now there is a way to override how the admin saves a given model, also providing access to the request object along the way. Remember the definition of ImportantModel from the previous section:

from django.db import models
from django.auth.contrib.models import User

class ImportantModel(models.Model):
    data = models.TextField()
    user = models.ForeignKey(User, null=True)

By adding an admin.py module to the application, it's possible to supply a new method to use when the admin attempts to save an instance of this model.

from django.contrib import admin

from important_app import models

class ImportantModelAdmin(admin.ModelAdmin):
    def save_model(self, request, instance, form, change):
        instance.user = request.user
        super(ImportantModelAdmin, self).save(request, instance, form, change)
admin.site.register(models.ImportantModel, ImportantModelAdmin)

Now, when a user adds a new instance of ImportantModel or changes an existing instance, the admin will use this method, and the current user will be added to the instance accordingly. There's no need to worry about anonymous users, because the admin is only available to authenticated users.

This works quite well, following the "ideal" approach where the request is simply passed to those methods that need it. The only problem is that it only works for the admin application. Other applications that could benefit from storing the current user on a related model, without having to rewrite existing views, are still left needing another solution.

Introducing the CurrentUserField

Another alternative approach is to keep track of the fields that need to contain the current user and update those fields whenever instances of their associated models are changed. This goes back to being a model-based approach, rather than being view-based, but it no longer requires an override of the save() method.

The first step is to mark a field as needing the current user to be inserted when the model is saved. This task is traditionally handled by save() or a view, but this approach will use a new type of field to manage it. This new CurrentUserField will live in the models.py module of a new current_user application.

from django.db import models
from django.contrib.auth.models import User

class CurrentUserField(models.ForeignKey):
    def __init__(self, **kwargs):
        super(CurrentUserField, self).__init__(User, null=True, **kwargs)

It's currently little more than a specialized ForeignKey that's been hard-coded to relate with Django's built-in User model. It specifies null=True to account for applications where the model may be edited by an anonymous user and other non-Web applications that might add or update records. As it stands, it would be usable in that regard alone, by simply replacing an existing ForeignKey with a new CurrentUserField. Here is how it looks on the ImportantModel from the preceding sections:

from django.db import models
from current_user.models import CurrentUserField

class ImportantModel(models.Model):
    data = models.TextField()
    user = CurrentUserField()

Very little has changed in this incarnation: one field has a new type, and one import was updated accordingly. The notable effect of this simple change is that no additional changes will be necessary on the model after this point. You can go through and add CurrentUserField to whatever models you like now, knowing that they'll continue to work while we work through the remainder of the code to support them properly.

Keeping Track of CurrentUserField Instances

The next thing to take care of is to keep a record of all the models that have CurrentUserField instances attached to them. This is important for performance; without it, the user-updating code described later would have to look at every model that gets saved and cycle over its fields, looking for instances of CurrentUserField. Instead, we can supply a registry of known instances that can speed things up considerably.

A new module, registration.py, will contain the code necessary to maintain a record of every CurrentUserField in use and supply information about that registry to other code that asks for it. It uses a slightly modified notion of the Borg pattern,2 looking fairly similar to the plugin architecture registry from Chapter 2.

class FieldRegistry(object):
    _registry = {}

    def add_field(self, model, field):
        reg = self.__class__._registry.setdefault(model, [])
        reg.append(field)

    def get_fields(self, model):
        return self.__class__._registry.get(model, [])

    def __contains__(self, model):
        return model in self.__class__._registry

The internal _registry dictionary exists only on the FieldRegistry class and is never copied out to any instances. All methods operate on that class-level dictionary, so it doesn't matter how many instances of FieldRegistry get created. All instances will use the same dictionary all the time. Take a look at it in action:

>>> from current_user.registration import FieldRegistry
>>> from important_app.models import ImportantModel
>>> registry = FieldRegistry()
>>> registry.add_field(ImportantModel, ImportantModel._meta.get_field('user'))
>>> registry.get_fields(ImportantModel)
[<current_user.models.CurrentUserField object at 0x...>]
>>> another_registry = FieldRegistry()
>>> ImportantModel in another_registry
True

Note also that this allows for more than one field to be registered on a given model. Since Django allows the same field to be used multiple times on a single model, FieldRegistry supports that as well. If just one instance of the field was stored in the registry, the last one assigned to the model would overwrite the first, which could cause confusion about what's going on behind the scenes. By explicitly supporting multiple fields per model, we can avoid that problem entirely.

The last step in the registration process is actually adding instances of CurrentUserField to the registry when they're added to models. Remember from Chapter 3 that fields provide a contribute_to_class() method that executes while Django processes a model's contents. Overriding that method on CurrentUserField gives access to the model class as well as the name it was given, but the registry is only interested in the model and field objects.

from django.db import models
from django.contrib.auth.models import User
from current_user import registration

class CurrentUserField(models.ForeignKey):
    def __init__(self, **kwargs):
        super(CurrentUserField, self).__init__(User, null=True, **kwargs)

    def contribute_to_class(self, cls, name):
        super(CurrentUserField, self).contribute_to_class(cls, name)
        registry = registration.FieldRegistry()
        registry.add_field(cls, self)

___________

Now CurrentUserField can register itself on any model it's attached to, without any additional intervention from you, the developer; simply assigning it to a model will suffice. This registration is the sole purpose of CurrentUserField, so its job is now done. All the rest of the work happens when a request is processed.

The CurrentUserMiddleware

Like the thread-local approach, CurrentUserField relies on a middleware class to get access to each incoming request and retrieve the current user. Middleware updates the fields without having to write your views specifically to do so, which opens it up for use in all applications, including the admin and other third-party applications where modifying code is problematic.

The real trick here is how to update CurrentUserField records without resorting to thread locals, and the answer is signals. Since Django provides a pre_save signal that fires just before an instance gets committed to the database, this new middleware can register a handler to execute at just the right time. These pieces come together in a new middleware.py module in the current_user application, starting with the workhorse: the update_users() method.

from current_user import registration

class CurrentUserMiddleware(object):
    def update_users(self, user, sender, instance, **kwargs):
        registry = registration.FieldRegistry()
        if sender in registry:
            for field in registry.get_fields(sender):
                setattr(instance, field.name, user)

As a signal handler, it gets two arguments from the pre_save handler: sender and instance; the user argument will be supplied by the process_request() method. Since sender is the model whose instance is currently being saved, it can be used to check whether the model is registered as having any CurrentUserField attributes. If so, it simply loops over them, setting the instance attribute for each one to the current user.

That won't do anything unless registered for the pre_save signal, which is a job for the process_request() method.

from django.db.models import signals
from django.utils.functional import curry

from current_user import registration
class CurrentUserMiddleware(object):
    def process_request(self, request):
        if hasattr(request, 'user') and request.user.is_authenticated():
            user = request.user
        else:
            user = None

        update_users = curry(self.update_users, user)
        signals.pre_save.connect(update_users, dispatch_uid=request, weak=False)

    def update_users(self, user, sender, instance, **kwargs):
        registry = registration.FieldRegistry()
        if sender in registry:
            for field in registry.get_fields(sender):
                setattr(instance, field.name, user)

This method starts out by checking whether the user is authenticated or not. Remember, CurrentUserField uses null=True to support anonymous users, so this step is necessary to make that distinction. By also checking to see if the request even has a user attribute at all, this handles cases where the default AuthenticationMiddleware is disabled or is placed after CurrentUserMiddleware in the MIDDLEWARE_CLASSES setting.

Continuing on, update_users() is curried into a new function, with the current user preloaded as its first argument. The resulting function is now configured for use as a signal handler. This signal will be registered for every incoming request, since the only way to know the current user is to get it when the request comes in. It must be removed when the request is finished; otherwise, multiple signal handlers would be competing to update the same fields.

Since update_users() is curried, there won't be a reference for it once process_request() finishes executing. In order to keep it from being destroyed before it can be useful, it gets registered with weak=False. Since the middleware doesn't get to keep a reference to the curried function, the dispatch_uid argument provides an alternative reference for the handler. There will only be one signal handler for each incoming request, so the request object is a suitable unique identifier.

Once the curried update_users() is then registered on the pre_save signal, Django continues on with other middleware and executes the view. Any models updated during that time will be checked by update_users() and updated as necessary. Once the view finishes, Django enters the response phase of middleware processing, where CurrentUserMiddleware needs to remove the listener, using the request to identify it.

from django.db.models import signals
from django.utils.functional import curry

from current_user import registration

class CurrentUserMiddleware(object):
    def process_request(self, request):
        if hasattr(request, 'user') and request.user.is_authenticated():
            user = request.user
        else:
            user = None
        update_users = curry(self.update_users, user)
        signals.pre_save.connect(update_users, dispatch_uid=request, weak=False)

    def update_users(self, user, sender, instance, **kwargs):
        registry = registration.FieldRegistry()
        if sender in registry:
            for field in registry.get_fields(sender):
                setattr(instance, field.name, user)

    def process_response(self, request, response):
        signals.pre_save.disconnect(dispatch_uid=request)
        return response

Performance Considerations

As mentioned, CurrentUserMiddleware will register a signal handler for every request that Django processes, checking for instances of CurrentUserField every time a model is saved within a request. On most small sites, this additional overhead is hardly noticeable, but high-volume sites may notice a reduction in the quality of the user experience. The benefits of data surveillance aren't worth degrading the experience provided to your users.

One way to keep overhead to a minimum is by restricting it to situations where updates are likely to take place. Chapter 7 explained how the HTTP standard expects certain methods to be "safe"—simply viewing a document shouldn't make any changes. These safe methods are GET, HEAD, OPTIONS and TRACE; process_request() can be written to special-case these methods, bypassing any further handling.

from django.db.models import signals
from django.utils.functional import curry

from current_user import registration

class CurrentUserMiddleware(object):
    def process_request(self, request):
        if request.method in ('GET', 'HEAD', 'OPTIONS', 'TRACE'):
            # This request shouldn't update anything,
            # so no singal handler should be attached.
            return

        if hasattr(request, 'user') and request.user.is_authenticated():
            user = request.user
        else:
            user = None

        update_users = curry(self.update_users, user)
        signals.pre_save.connect(update_users, dispatch_uid=request, weak=False)

    def update_users(self, user, sender, instance, **kwargs):
        registry = registration.FieldRegistry()
        if sender in registry:
            for field in registry.get_fields(sender):
                setattr(instance, field.name, user)

    def process_response(self, request, response):
        signals.pre_save.disconnect(dispatch_uid=request)
        return response

Even among requests that do modify data, not all views modify the models that are being managed by CurrentUserField. There's no way for a third-party application like this to programmatically know which models are modified by which views, so the middleware simply looks at all of them. This can be avoided by only applying this middleware on those views that you know modify the affected models.

To achieve this, we turn to django.utils.decorators, which contains the useful decorator_from_middleware function that was shown in Chapter 7. This utility function takes a middleware, like our CurrentUserMiddleware, and converts it into a decorator that can be applied to just those views that need its features. This new decorator can be provided in the middleware module, right alongside the middleware it accesses.

from django.db.models import signals
from django.utils.functional import curry
from django.utils.decorators import decorator_from_middleware

from current_user import registration

class CurrentUserMiddleware(object):
    def process_request(self, request):
        if request.method in ('GET', 'HEAD', 'OPTIONS', 'TRACE'):
            # This request shouldn't update anything,
            # so no singal handler should be attached.
            return

        if hasattr(request, 'user') and request.user.is_authenticated():
            user = request.user
        else:
            user = None

        update_users = curry(self.update_users, user)
        signals.pre_save.connect(update_users, dispatch_uid=request, weak=False)

    def update_users(self, user, sender, instance, **kwargs):
        registry = registration.FieldRegistry()
        if sender in registry:
           for field in registry.get_fields(sender):
               setattr(instance, field.name, user)
    def process_response(self, request, response):
        signals.pre_save.disconnect(dispatch_uid=request)
        return response

record_current_user = decorator_from_middleware(CurrentUserMiddleware)

Now it's possible to import this decorator and apply it to the views that modify any of the models with a CurrentUserField attached. One obvious example is the admin interface, which shouldn't be open to the general public and should therefore have a limited number of users. This does raise one last problem: the decorator produced by decorator_from_middleware only works on functions, not on callable objects.

The admin site uses an object—typically django.contrib.admin.site.root—as the view in URL configurations, so the record_current_user decorator won't work with it directly. Instead, a small wrapper function needs to be placed between the two, which can satisfy the decorator while passing everything through to the admin site object.

from django.conf.urls.defaults import *
from django.contrib import admin

from current_user.middleware import record_current_user

admin.autodiscover()

urlpatterns = patterns('',
    (r'^admin/doc/', include('django.contrib.admindocs.urls')),
    (r'^admin/(.*)', record_current_user(lambda *args: admin.site.root(*args))),

    # The rest of the site gets configured here.
)

If the vast majority of the site's views do update models that have CurrentUserField attributes—to be expected if you're tracking all the models in your applications—the programmer overhead of having to import and apply the decorator to every view may not be worth it. Since nearly all views would need the decorator, applying the middleware makes more sense in that situation.

Keeping Historical Records

Capturing the last user to make a change is useful to a point, but finding out more information requires talking to that user in person and asking what was changed. Worse yet, there's no record of who else changed anything previously, so there's no way to know what path a record took from beginning to end.

By bringing together even more of the techniques listed in this book—dynamic models, custom field-like objects, descriptors and curried functions for a start—we can supply a framework for tracking the changes of any model in any application under your control. This includes who changed the model, when it was changed and what it looked like at the time. In keeping with DRY, it's even possible to add this functionality to a model with a single line.

Intended Usage

Managing the history of objects requires a fairly detailed application, and it can be hard to understand the end goal when looking at everything individually. This section provides an overview of the features that will be available when the application is completed, so you can start seeing these features fall into place as the code progresses.

First is the act of assigning a history manager to the model that will be archived. This should be as simple as possible, preferably just a single attribute assignment. Simply pick a name and assign an object, just like Django's own model fields.

from django.db import models
from django.contrib.auth.models import User

import history

class Contact(models.Model):
    user = models.OneToOneField(User)
    phone_number = models.CharField(max_length=15, blank=True)
    address = models.CharField(max_length=255, blank=True)
    city = models.CharField(max_length=255, blank=True)
    state = models.CharField(max_length=255, blank=True)
    zip_code = models.CharField('ZIP code', max_length=10, blank=True)

    history = history.HistoricalRecords()

    #Descriptors and methods are here (see Chapter 10)

That's enough to get everything configured. From there, the framework is able to set up a model behind the scenes to store old records and the history attribute can be used to access those records using Django's standard database API methods. Consider a Contact object for the author of this book:

>>> from contacts.models import Contact
>>> from django.contrib.auth.models import User
>>> author_user = User.objects.get(username='martyalchin')
>>> author = Contact.objects.create(user=author_user, state='MI')
>>> print '%s (%s)' % (author, author._meta.object_name)
Marty Alchin (Contact)

This object will function just as it normally would, but with the addition of the history attribute, additional information about the author's history is available. To start, one historical record is available from when the Contact object was first created.

>>> for record in author.history.all():
...     print '%s (%s)' % (record, record._meta.object_name)
...
Marty Alchin as of 2008-10- 08 16:09:57 (HistoricalContact)



Note Historical records won't magically exist for data that was already in the database. Only new records and updates will get tracked. If you'd like to make a historical record for each row in your existing database, simply save them all one at a time, using a loop such as [c.save() for c in Contact.objects.all()]. This may be problematic for large databases, where some custom SQL may be more appropriate.


If the contact changes his phone number, it's necessary to update the Contact record accordingly. That change also shows up as a new historical record.

>>> author.phone_number
u'555-555-5555'
>>> author.phone_number = '517-555-2424'
>>> author.save()
>>> for record in author.history.all():
...     print u'%s (%s)' % (record, record.phone_number)
...
Marty Alchin as of 2008-10-08 16:46:51 (517-555-2424)
Marty Alchin as of 2008-10-08 16:09:57 (555-555-5555)

Notice that they're sorted with the most recent record first. This allows for some simple methods to be added, making it easier to get older copies. For instance, the history manager also has a most_recent() method, which returns a Contact object with its attributes set to those found in the most recent historical record.

>>> recent = author.history.most_recent()
>>> print '%s (%s)' % (recent, recent.phone_number)
Marty Alchin (517-555-2424)

Even though each historical record is a different model than the original, a true Contact object is available from any HistoricalContact by using the history_object attribute.

>>> record = author.history.all()[0]
>>> print record
Marty Alchin as of 2008-10-08 16:46:51
>>> print type(record)
<class 'contacts.models.HistoricalContact'>
>>> print record.history_object
Marty Alchin
>>> print type(record.history_object)
<class 'contacts.models.Contact'>

In the event that a specific date is known, the historical manager also includes a shortcut function to return a Contact object containing the values that were true for the given object as of the date specified.

>>> import datetime

# 'then' is just minutes before the last update
>>> then = datetime.datetime(2008, 10, 8, 16, 45)
>>> old_contact = author.history.as_of(then)
>>> print '%s (%s)' % (old_contact, old_contact.phone_number)
Marty Alchin (555-555-5555)

Even after a contact has been deleted, a record of it still remains. In fact, a new record is added to indicate when the contact was deleted. Given the original ID, an empty Contact object can be used to retrieve historical records, including the most_recent() method for a kind of "undo" functionality.

>>> author.delete()
>>> author = Contact(pk=1) # Note: not retrieved from the database
>>> old_contact = author.history.most_recent()
>>> print '%s (%s)' % (old_contact, old_contact.phone_number)
Marty Alchin (517-555-2424)

Each historical record is identified as one of three types: created, changed or deleted.

>>> for record in author.history.all():
...     print u' %s %s (%s)' % (record.history_type, record,
...                             record.get_history_type_display())
...
 − Marty Alchin as of 2008-10-08 17:19:13 (Deleted)
 ~ Marty Alchin as of 2008-10-08 16:46:51 (Changed)
 + Marty Alchin as of 2008-10-08 16:09:57 (Created)

Overview of the Process

The whole registration of a history manager begins by assigning a HistoricalRecords object to a model, so that's a good place to start defining code. This will live in the models.py module of a new history application. There are a number of things that have to happen in sequence to get the history system initialized for a particular model; at a high level, HistoricalRecords manages all of the following tasks:

  1. Create a copy of the model it's attached to, with additional fields for auditing purposes.
  2. Register signal handlers to execute when the original model is saved or deleted. These in turn add new historical records each time the model is modified.
  3. Assign a manager to the original model, using the attribute name where the HistoricalRecords was assigned. This manager will then access historical information.

That's a short list, but each step requires a fair amount of code, combining several of the techniques described throughout this book. Before any of those steps can really begin, there's a small amount of housekeeping that must be done. Since the HistoricalRecords object gets assigned as an attribute of a model, the first chance it gets to execute is in the contribute_to_class() method.

class HistoricalRecords(object):
    def contribute_to_class(self, cls, name):
        self.manager_name = name

So far it's not much, but this is the only point in the process where Django tells HistoricalRecords what name it was given when assigned to the model. This is stored away for future reference during Step 3.

Step 1: Copy the Model

In order to store the data from a model instance in a historical record that can be easily added, searched and retrieved, we need a new model behind the scenes. In theory, we could use any structure that can contain data; perhaps a single TextField that contains pickled objects. But to search and browse the historical data more easily, it makes sense to use the same data structure as the original model itself.

Chapter 3 showed that a model's _meta attribute contains all the information about how that model was defined, including all of its fields in the order they were declared. This information is crucial, because it allows us to create a new model that matches that same data structure. The only trouble is that contribute_to_class() gets called on each field in turn, in the order they appear in the namespace dictionary Python created for the model's definition. Since standard dictionaries don't have a guaranteed order, there's no way to predict how many fields will already have been processed by the time HistoricalRecords gets a chance to peek at the model.

To solve this, we turn to a signal: class_prepared. Django fires this signal once all the fields and managers have been added to the model and everything is in place to be used by external code. That's when HistoricalRecords will have guaranteed access to all the fields, including the order in which they were defined, so contribute_to_class() continues by setting up a listener for class_prepared.

from django.db import models

class HistoricalRecords(object):
    def contribute_to_class(self, cls, name):
        self.manager_name = name
        models.signals.class_prepared.connect(self.finalize, sender=cls)

Django will now call HistoricalRecords.finalize() with the fully- prepared model once everything is in place to continue processing it. That method is then responsible for performing all of the remaining tasks, all the way through Step 3. Most of the details are delegated to other methods, but finalize() coordinates them.

The first thing finalize() needs to do is copy the original model to create a new model with extra fields attached. It defers this task to the create_history_model() method, which in turn relies on a few other methods.

import copy
import datetime

from django.db import models

from current_user import models as current_user
class HistoricalRecords(object):
    def contribute_to_class(self, cls, name):
        self.manager_name = name
        models.signals.class_prepared.connect(self.finalize, sender=cls)

    def finalize(self, sender, **kwargs):
        history_model = self.create_history_model(sender)

    def create_history_model(self, model):
        """
        Creates a historical model to associate with the model provided.
        """
        attrs = self.copy_fields(model)
        attrs.update(self.get_extra_fields(model))
        attrs.update(Meta=type('Meta', (), self.get_meta_options(model)))
        name = 'Historical%s' % model._meta.object_name
        return type(name, (models.Model,), attrs)

There are a few different sub-steps required in creating a model like this. Adding all the logic in one method would hamper readability and maintainability, so it's been broken up into three additional methods.

Copying the Model's Fields

The copy_fields() method is tasked with copying the existing fields on the model, returning a dictionary with new fields that can be applied to the history model. This is a more complicated task than it may sound, because there are a few special cases that need to be accounted for, but Python provides a tool to help with the common case.

Python's copy module3 is designed to copy an object and all of its attributes into a new object. This operation is necessary in the event that any of these field attributes get changed; changing one field shouldn't affect another. If we simply assign the existing field to the new model, they would be the same object, sharing a namespace. A change to one would affect the other, which isn't a good thing. The copy.copy() function takes care of our needs.

After copying each field, copy_fields() has to take care of two special cases. The first is that a model can only ever contain one AutoField attribute, and it must be the primary key. Django provides an AutoField as the primary key for any model that doesn't explicitly declare a different primary key, so this is a common case. Since there will likely be multiple historical records for a given ID, the history model has a separate AutoField for its primary key. Any existing AutoField instances that are found on the original model must be changed to a standard IntegerField on the history model.

The next special case to take care of is that uniqueness can no longer be guaranteed on any field. Both the unique and primary_key arguments imply that a field's value must be unique across all rows in the model, which won't be true in a historical context. Any field found with either of these attributes set to True is changed to False, with db_index set to True instead. Having a unique field on the original model implies some importance, so adding an index to it will help speed up queries that rely on that field's content.

___________

In addition to the fields themselves, every model needs an attribute named __module__ that Django can use to determine what application it belongs to. Since this history model will be tied to the original model, __module__ can be copied straight over to the new model along with its fields. This way, copy_fields() provides everything necessary to make the history model function like the original model as much as possible.

   def copy_fields(self, model):
       """
       Creates copies of the model's original fields, returning
       a dictionary mapping field name to copied field object.
       """
       # Though not strictly a field, this attribute
       # is required for a model to function properly.
       fields = {'__module__': model.__module__}

       for field in model._meta.fields:
           field = copy.copy(field)

           if isinstance(field, models.AutoField):
               # The historical model gets its own AutoField, so any
               # existing one must be replaced with an IntegerField.
               field.__class__ = models.IntegerField

           if field.primary_key or field.unique:
               # Unique fields can no longer be guaranteed unique,
               # but they should still be indexed for faster lookups.
               field.primary_key = False
               field._unique = False
               field.db_index = True
         fields[field.name] = field

     return fields
Adding Record-Keeping Fields

So far, the history model is only set up to store the same values as the original model. It does keep a historical record of each stage an instance went through, but without anything else, it's of little real-world value. It needs some extra information along with that data to make it useful. There are a few basic components that are useful in nearly all situations.

  • The date the model instance was changed.
  • The user who initiated the change, if any. This is controlled using the CurrentUserField explained earlier in this chapter.
  • The type of change that took place. This will be one of three values: '+' for a new instance, '~' for an update to an existing instance and '' for a deleted instance.

Most models also include a __unicode__() method that controls how a model instance will be displayed when printed to a console or written to a string, such as a template. To preserve this while still indicating its historical status, get_extra_fields() provides a new __unicode__() method that simply uses the original method and adds a date to the end of the string. This is done with the help of a special history_object attribute, which will be explained in the next section.

   def get_extra_fields(self, model):
       """
       Returns a dictionary of fields that will be added to the historical
       record model, in addition to the ones returned by copy_fields below.
       """
       rel_nm = '_%s_history' % model._meta.object_name.lower()
       return {
           'history_id': models.AutoField(primary_key=True),
           'history_date': models.DateTimeField(default=datetime.datetime.now),
           'history_user': current_user.CurrentUserField(related_name=rel_nm),
           'history_type': models.CharField(max_length=1, choices=(
               ('+', 'Created'),
               ('~', 'Changed'),
               ('− ', 'Deleted'),
           )),
           'history_object': HistoricalObjectDescriptor(model),
           '__unicode__': lambda self: u'%s as of %s' % (self.history_object,
                                                         self.history_date)
       }

One advantage of providing this extra information in a separate method is that get_extra_fields() offers a chance for customization. Many projects have some additional information, such as a SITE_ID, that could be logged alongside this information to give greater insight into the data. Overriding get_extra_fields() provides an opportunity to easily add those extra fields.

from django.conf import settings
from django.db import models

class SiteHistoricalRecords(HistoricalRecords):
    def get_extra_fields(self, model):
        fields = super(SiteHistoricalRecords, self).get_extra_fields(model)
        fields.update({
            'history_site': models.IntegerField(default=settings.SITE_ID),
        })
        return fields

In addition, overriding get_extra_fields() allows other types of customizations. If the provided field names or the __unicode__() implementation don't suit your taste, feel free to replace them. This makes get_extra_fields() the method for the more flexible aspects of the history model.

Accessing a True Model Instance

Since the history model is designed for storing and retrieving information about what a model instance looked like at points in the past, it makes sense to have access to an instance of the original model with the historical values. This provides access to any custom methods or other attributes that didn't get copied over to the history model. This is especially necessary for implementing a proper __unicode__() representation.

Since the historical record of an instance contains all of the field values of the instance itself, no additional database calls are necessary to populate an instance of the original model. This is accessible on a historical instance through the history_object attribute, which is implemented as a descriptor.

class HistoricalObjectDescriptor(object):
    def __init__(self, model):
        self.model = model

    def __get__(self, instance, owner):
        values = (getattr(instance, f.attname) for f in self.model._meta.fields)
        return self.model(*values)

It needs to take the original model as an argument and use that instead of the owner argument to the __get__() method because owner is the history model, not the original model. Using the original model's collection of fields as a guide, the descriptor pulls the appropriate values from the instance and creates a new instance. This new instance has all the original methods, including save(), which can be used to restore an older copy of an instance.

Adding Meta Options

Another necessary item for creating the history model is a dictionary of options to be included as the model's Meta inner class. The only option that is actually required is ordering, which makes sure that the records are sorted in descending order by date.

   def get_meta_options(self, model):
       """
       Returns a dictionary of fields that will be added to
       the Meta inner class of the historical record model.
       """
       return {
           'ordering': ('-history_date',),
       }

Other implementations can override this method to add more options as well, if necessary. With nearly everything in place, all that's left is for create_history_model() to create a new name for the history model and pass everything to type() to create it. Then, finalize() can use that new model to perform additional tasks.

Step 2: Register Signal Handlers

There are two ways of modifying a model instance, and Django provides signals to hook into both of them; the post_save and post_delete signals are fired when saving and deleting an instance, respectively.

   def finalize(self, sender, **kwargs):
       history_model = self.create_history_model(sender)

       # The HistoricalRecords object will be discarded,
       # so the signal handlers can't use weak references.
       models.signals.post_save.connect(self.post_save, sender=sender,
                                                        weak=False)
       models.signals.post_delete.connect(self.post_delete, sender=sender,
                                                            weak=False)

The HistoricalRecords object isn't used for anything after its initial setup, so it's discarded by Python's garbage collection fairly quickly. Because the signal handlers are methods of that object and signals use weak references by default, the handlers get removed from the signal as soon as HistoricalRecords goes away. Passing weak=False forces the signals to use strong references for these methods, keeping them alive long enough to do their jobs.

Like most of this system, the actual implementations of these two signal handlers each delegate to a separate method to reuse code. They both perform the same task, adding an entry to the history model, so it makes sense to share code as well. The only difference between the two is the value each provides for the history_type field of the historical record.

    def post_save(self, instance, created, **kwargs):
        self.create_historical_record(instance, created and '+' or '~')

    def post_delete(self, instance, **kwargs):
        self.create_historical_record(instance, '- ')

    def create_historical_record(self, instance, type):
        manager = getattr(instance, self.manager_name)
        attrs = {}
        for field in instance._meta.fields:
            attrs[field.attname] = getattr(instance, field.attname)
        manager.create(history_type=type, **attrs)

The manager used to create this entry in the history model is determined according to the manager_name attribute that was set aside when contribute_to_class() was called at the beginning of the process. The manager assigned there is the last step of the process.

Step 3: Assign a Manager

In order to access the historical records for a given model, a manager is attached to the original model using the name where the HistoricalRecords object was assigned. The object that gets assigned is actually a descriptor, which creates a customized manager when accessed. All the manager code is located in a new module, manager.py, which is referenced from models.py as follows:

import copy
import datetime

from django.db import models

from current_user import models as current_user
import history.manager

class HistoricalRecords(object):
    def contribute_to_class(self, cls, name):
        self.manager_name = name
        models.signals.class_prepared.connect(self.finalize, sender=cls)

    def finalize(self, sender, **kwargs):
        history_model = self.create_history_model(sender)

        # The HistoricalRecords object will be discarded,
        # so the signal handlers can't use weak references.
        models.signals.post_save.connect(self.post_save, sender=sender,
                                                         weak=False)
        models.signals.post_delete.connect(self.post_delete, sender=sender,
                                                             weak=False)

        descriptor = history.manager.HistoryDescriptor(history_model)
        setattr(sender, self.manager_name, descriptor)

    # Additional methods described in previous sections

The addition of those lines completes the code necessary in models.py; the remainder of the functionality is implemented in manager.py instead. The descriptor that gets assigned to the original model is fairly simple, storing the history model and using that to create customized managers. The HistoryManager then accepts the history model and instance and stores them for later. Note that instance is an optional argument, allowing the manager to be used on the original model itself. This, in turn, will retrieve all historical records for that model, regardless of what instance they are attached to.

Notice also that the model attribute is received and stored, but not used by any of this code. Django's own Manager class uses self.model to determine what model it should reference in database queries. Assigning the right model to self.model is all that's necessary to tell Django how to get data from the correct table and formulate results using the appropriate instances.

from django.db import models

class HistoryDescriptor(object):
    def __init__(self, model):
        self.model = model

    def __get__(self, instance, owner):
        if instance is None:
            return HistoryManager(self.model)
        return HistoryManager(self.model, instance)

class HistoryManager(models.Manager):
    def __init__(self, model, instance=None):
        super(HistoryManager, self).__init__()
        self.model = model
        self.instance = instance

    def get_query_set(self):
        if self.instance is None:
            return super(HistoryManager, self).get_query_set()

        filter = {self.instance._meta.pk.name: self.instance.pk}
        return super(HistoryManager, self).get_query_set().filter(**filter)

This overridden get_query_set() method is what allows a HistoryManager to retrieve objects matching the ID of a given instance of the original model. No special ordering is required because the history model's Meta inner class already has an ordering option set. In addition to simply retrieving a list of related historical records, HistoryManager can contain extra methods to perform more specific searches.

Retrieving the Most Recent Copy of an Instance

In the event that a model instance has changed since the last time it was saved or was even deleted previously, it becomes necessary to quickly and easily retrieve the last known state of the instance. Since that information is stored in a history model, which in turn is accessible by the HistoryManager, a new manager method can do this job without requiring any arguments at all. The first requirement is that this method should not be available on the model itself, only on instances.

   def most_recent(self):
       """
       Returns the most recent copy of the instance available in the history.
       """
       if not self.instance:
           raise TypeError("Can't use most_recent() without a %s instance." %
                           self.instance._meta.object_name)

Now that we can be sure there is a valid model instance to work with, the next step is to gather up the field names that exist on the model, so that only those fields are retrieved. This method returns an instance of the original model, not the history model. Retrieving any additional fields would not only be wasteful, it would also require more code to remove them before populating the model instance, since those extra fields aren't supported by that model.

   def most_recent(self):
       """
       Returns the most recent copy of the instance available in the history.
       """
       if not self.instance:
           raise TypeError("Can't use most_recent() without a %s instance." %
                           self.instance._meta.object_name)
       fields = (field.name for field in self.instance._meta.fields)

Note that this needs to use the _meta attribute of self.instance, rather than self.model because self.model is the history model, not the original model that we're keeping track of. With a list of fields in place, a simple call to values_list() retrieves the values for all recorded states for the given instance. Because those states are sorted descending by date, the first row is always the most recent, so using an index of 0 will issue the appropriate query.

   def most_recent(self):
       """
       Returns the most recent copy of the instance available in the history.
       """
       if not self.instance:
           raise TypeError("Can't use most_recent() without a %s instance." %
                           self.instance._meta.object_name)
       fields = (field.name for field in self.instance._meta.fields)
       try:
           values = self.values_list(*fields)[0]
       except IndexError:
           raise self.instance.DoesNotExist("%s has no historical record." %
                                            self.instance._meta.object_name)

Catching IndexError allows for a more useful error message in the event that there is no history data available for the given instance. In this case, a different exception is raised, using the model's own DoesNotExist class to keep in line with the way Django's own instance lookups work. If no error is raised, values will have all the values necessary to populate an instance of the original model. It then does exactly this and returns it for other code to use.

   def most_recent(self):
       """
       Returns the most recent copy of the instance available in the history.
       """
       if not self.instance:
           raise TypeError("Can't use most_recent() without a %s instance." %
                           self.instance._meta.object_name)
       fields = (field.name for field in self.instance._meta.fields)
       try:
           values = self.values_list(*fields)[0]
       except IndexError:
           raise self.instance.DoesNotExist("%s has no historical record." %
                                            self.instance._meta.object_name)
       return self.instance.__class__(*values)
Retrieving an Instance As It Existed at a Specific Point in Time

Similar to most_recent(), it's also sometimes useful to see what a model instance looked like on some specific date or at a particular time. This is useful, for instance, when customers ask about products or resources that they heard about some time ago. Being able to retrieve the item as it existed on the date in question can be a valuable tool in serving those customers' needs. Like most_recent(), the new as_of() method starts by making sure it only gets used on a model instance, rather than the model itself.

   def as_of(self, date):
       """
       Returns an instance of the original model with all the attributes set
       according to what was present on the object on the date provided.
       """
       if not self.instance:
           raise TypeError("Can't use as_of() without a %s instance." %
                           self.instance._meta.object_name)

The list of fields is retrieved the same way as in most_recent(), but it's not passed directly into a values_list() query. The as_of() query needs to limit its results to the data that was accurate on the date supplied, so we must first apply a filter() to satisfy that condition.

   def as_of(self, date):
       """
       Returns an instance of the original model with all the attributes set
       according to what was present on the object on the date provided.
       """
       if not self.instance:
           raise TypeError("Can't use as_of() without a %s instance." %
                           self.instance._meta.object_name)
       fields = (field.name for field in self.instance._meta.fields)
       qs = self.filter(history_date__lte=date)

This new QuerySet is what we need to retrieve the instance values for the particular date. Again, a values_list() query is used and limited to the first result to obtain the record nearest to the date provided. If no records are found, the same DoesNotExist exception is raised, but with a slightly different message to indicate that there may be records for the object, but none before the date specified.

   def as_of(self, date):
       """
       Returns an instance of the original model with all the attributes set
       according to what was present on the object on the date provided.
       """
       if not self.instance:
           raise TypeError("Can't use as_of() without a %s instance." %
                           self.instance._meta.object_name)
       fields = (field.name for field in self.instance._meta.fields)
       qs = self.filter(history_date__lte=date)
       try:
           values = qs.values_list('history_type', *fields)[0]
     except IndexError:
           raise self.instance.DoesNotExist("%s had not yet been created." %
                                            self.instance._meta.object_name)

Note also that the values_list() query used here includes an extra field not present in the most_recent() query. One last check must be performed on the data before it's used to populate an instance of the model, and the history_type is necessary for that. If the row returned from the query has a history_type of "-", that means the instance was deleted prior to the date specified, so it technically didn't exist as of that date. Rather than return an object that didn't exist, as_of() raises DoesNotExist, explaining what happened.

   def as_of(self, date):
       """
       Returns an instance of the original model with all the attributes set
       according to what was present on the object on the date provided.
       """
       if not self.instance:
           raise TypeError("Can't use as_of() without a %s instance." %
                           self.instance._meta.object_name)
       fields = (field.name for field in self.instance._meta.fields)
       qs = self.filter(history_date__lte=date)
       try:
           values = qs.values_list('history_type', *fields)[0]
       except IndexError:
           raise self.instance.DoesNotExist("%s had not yet been created." %
                                            self.instance._meta.object_name)
       if values[0] == '-':
           raise self.instance.DoesNotExist("%s had already been deleted." %
                                            self.instance._meta.object_name)

With all the sanity checks completed, we can be certain that the values retrieved are valid for an object that existed on the date passed to the method. Since the first value retrieved in the QuerySet was the history_type, which isn't part of the original model, a slice is taken to retrieve the rest of the values, which are then passed to the model instead.

   def as_of(self, date):
       """
       Returns an instance of the original model with all the attributes set
       according to what was present on the object on the date provided.
       """
       if not self.instance:
           raise TypeError("Can't use as_of() without a %s instance." %
                           self.instance._meta.object_name)
       fields = (field.name for field in self.instance._meta.fields)
       qs = self.filter(history_date__lte=date)
       try:
           values = qs.values_list('history_type', *fields)[0]
       except IndexError:
           raise self.instance.DoesNotExist("%s had not yet been created." %
                                            self.instance._meta.object_name)
       if values[0] == '- ':
           raise self.instance.DoesNotExist("%s had already been deleted." %
                                            self.instance._meta.object_name)
       return self.instance.__class__(*values[1:])

Now What?

The tools and techniques discussed in this book go well beyond the official Django documentation, but there's still a lot left unexplored. There are plenty of other innovative ways to use Django and Python; the rest is up to you.

As you work your way through your own applications, be sure to consider giving back to the Django community. The framework is available because others decided to distribute it freely; by doing the same, you can help even more people uncover more possibilities. The Appendix explains how you can give back to the community and to the framework itself.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset