Case study

To tie together some of the principles presented in this chapter, let's build a mailing list manager. The manager will keep track of e-mail addresses categorized into named groups. When it's time to send a message, we can pick a group and send the message to all e-mail addresses assigned to that group.

Now, before we start working on this project, we ought to have a safe way to test it, without sending e-mails to a bunch of real people. Luckily, Python has our back here; like the test HTTP server, it has a built-in Simple Mail Transfer Protocol (SMTP) server that we can instruct to capture any messages we send without actually sending them. We can run the server with the following command:

python -m smtpd -n -c DebuggingServer localhost:1025

Running this command at a command prompt will start an SMTP server running on port 1025 on the local machine. But we've instructed it to use the DebuggingServer class (it comes with the built-in SMTP module), which, instead of sending mails to the intended recipients, simply prints them on the terminal screen as it receives them. Neat, eh?

Now, before writing our mailing list, let's write some code that actually sends mail. Of course, Python supports this in the standard library, too, but it's a bit of an odd interface, so we'll write a new function to wrap it all cleanly:

import smtplib
from email.mime.text import MIMEText

def send_email(subject, message, from_addr, *to_addrs,
        host="localhost", port=1025, **headers):

    email = MIMEText(message)
    email['Subject'] = subject
    email['From'] = from_addr
    for header, value in headers.items():
        email[header] = value

    sender = smtplib.SMTP(host, port)
    for addr in to_addrs:
        del email['To']
        email['To'] = addr
        sender.sendmail(from_addr, addr, email.as_string())
    sender.quit()

We won't cover the code inside this method too thoroughly; the documentation in the standard library can give you all the information you need to use the smtplib and email modules effectively.

We've used both variable argument and keyword argument syntax in the function call. The variable argument list allows us to supply a single string in the default case of having a single to address, as well as permitting multiple addresses to be supplied if required. Any extra keyword arguments are mapped to e-mail headers. This is an exciting use of variable arguments and keyword arguments, but it's not really a great interface for the person calling the function. In fact, it makes many things the programmer will want to do impossible.

The headers passed into the function represent auxiliary headers that can be attached to a method. Such headers might include Reply-To, Return-Path, or X-pretty-much-anything. But in order to be a valid identifier in Python, a name cannot include the - character. In general, that character represents subtraction. So, it's not possible to call a function with Reply-To = [email protected]. It appears we were too eager to use keyword arguments because they are a new tool we just learned about in this chapter.

We'll have to change the argument to a normal dictionary; this will work because any string can be used as a key in a dictionary. By default, we'd want this dictionary to be empty, but we can't make the default parameter an empty dictionary. So, we'll have to make the default argument None, and then set up the dictionary at the beginning of the method:

def send_email(subject, message, from_addr, *to_addrs,
        host="localhost", port=1025, headers=None):

    headers = {} if headers is None else headers

If we have our debugging SMTP server running in one terminal, we can test this code in a Python interpreter:

>>> send_email("A model subject", "The message contents",
 "[email protected]", "[email protected]", "[email protected]")

Then, if we check the output from the debugging SMTP server, we get the following:

---------- MESSAGE FOLLOWS ----------
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Subject: A model subject
From: [email protected]
To: [email protected]
X-Peer: 127.0.0.1

The message contents
------------ END MESSAGE ------------
---------- MESSAGE FOLLOWS ----------
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Subject: A model subject
From: [email protected]
To: [email protected]
X-Peer: 127.0.0.1

The message contents
------------ END MESSAGE ------------

Excellent, it has "sent" our e-mail to the two expected addresses with subject and message contents included. Now that we can send messages, let's work on the e-mail group management system. We'll need an object that somehow matches e-mail addresses with the groups they are in. Since this is a many-to-many relationship (any one e-mail address can be in multiple groups; any one group can be associated with multiple e-mail addresses), none of the data structures we've studied seems quite ideal. We could try a dictionary of group-names matched to a list of associated e-mail addresses, but that would duplicate e-mail addresses. We could also try a dictionary of e-mail addresses matched to groups, resulting in a duplication of groups. Neither seems optimal. Let's try this latter version, even though intuition tells me the groups to e-mail address solution would be more straightforward.

Since the values in our dictionary will always be collections of unique e-mail addresses, we should probably store them in a set container. We can use defaultdict to ensure that there is always a set container available for each key:

from collections import defaultdict
class MailingList:
    '''Manage groups of e-mail addresses for sending e-mails.'''
    def __init__(self):
        self.email_map = defaultdict(set)

    def add_to_group(self, email, group):
        self.email_map[email].add(group)

Now, let's add a method that allows us to collect all the e-mail addresses in one or more groups. This can be done by converting the list of groups to a set:

    def emails_in_groups(self, *groups):
        groups = set(groups)
        emails = set()
        for e, g in self.email_map.items():
            if g & groups:
                emails.add(e)
        return emails

First, look at what we're iterating over: self.email_map.items(). This method, of course, returns a tuple of key-value pairs for each item in the dictionary. The values are sets of strings representing the groups. We split these into two variables named e and g, short for e-mail and groups. We add the e-mail address to the set of return values only if the passed in groups intersect with the e-mail address groups. The g & groups syntax is a shortcut for g.intersection(groups); the set class does this by implementing the special __and__ method to call intersection.

Now, with these building blocks, we can trivially add a method to our MailingList class that sends messages to specific groups:

def send_mailing(self, subject, message, from_addr,
        *groups, headers=None):
    emails = self.emails_in_groups(*groups)
    send_email(subject, message, from_addr,
            *emails, headers=headers)

This function relies on variable argument lists. As input, it takes a list of groups as variable arguments. It gets the list of e-mails for the specified groups and passes those as variable arguments into send_email, along with other arguments that were passed into this method.

The program can be tested by ensuring the SMTP debugging server is running in one command prompt, and, in a second prompt, loading the code using:

python -i mailing_list.py

Create a MailingList object with:

>>> m = MailingList()

Then create a few fake e-mail addresses and groups, along the lines of:

>>> m.add_to_group("[email protected]", "friends")
>>> m.add_to_group("[email protected]", "friends")
>>> m.add_to_group("[email protected]", "family")
>>> m.add_to_group("[email protected]", "professional")

Finally, use a command like this to send e-mails to specific groups:

>>> m.send_mailing("A Party",
"Friends and family only: a party", "[email protected]", "friends",
"family", headers={"Reply-To": "[email protected]"})

E-mails to each of the addresses in the specified groups should show up in the console on the SMTP server.

The mailing list works fine as it is, but it's kind of useless; as soon as we exit the program, our database of information is lost. Let's modify it to add a couple of methods to load and save the list of e-mail groups from and to a file.

In general, when storing structured data on disk, it is a good idea to put a lot of thought into how it is stored. One of the reasons myriad database systems exist is that if someone else has put this thought into how data is stored, you don't have to. For this example, let's keep it simple and go with the first solution that could possibly work.

The data format I have in mind is to store each e-mail address followed by a space, followed by a comma-separated list of groups. This format seems reasonable, and we're going to go with it because data formatting isn't the topic of this chapter. However, to illustrate just why you need to think hard about how you format data on disk, let's highlight a few problems with the format.

First, the space character is technically legal in e-mail addresses. Most e-mail providers prohibit it (with good reason), but the specification defining e-mail addresses says an e-mail can contain a space if it is in quotation marks. If we are to use a space as a sentinel in our data format, we should technically be able to differentiate between that space and a space that is part of an e-mail. We're going to pretend this isn't true, for simplicity's sake, but real-life data encoding is full of stupid issues like this. Second, consider the comma-separated list of groups. What happens if someone decides to put a comma in a group name? If we decide to make commas illegal in group names, we should add validation to ensure this to our add_to_group method. For pedagogical clarity, we'll ignore this problem too. Finally, there are many security implications we need to consider: can someone get themselves into the wrong group by putting a fake comma in their e-mail address? What does the parser do if it encounters an invalid file?

The takeaway from this discussion is to try to use a data-storage method that has been field tested, rather than designing your own data serialization protocol. There are a ton of bizarre edge cases you might overlook, and it's better to use code that has already encountered and fixed those edge cases.

But forget that, let's just write some basic code that uses an unhealthy dose of wishful thinking to pretend this simple data format is safe:

[email protected] group1,group2
[email protected] group2,group3

The code to do this is as follows:

    def save(self):
        with open(self.data_file, 'w') as file:
            for email, groups in self.email_map.items():
                file.write(
                    '{} {}
'.format(email, ','.join(groups))
                )

    def load(self):
        self.email_map = defaultdict(set)
        try:
            with open(self.data_file) as file:
                for line in file:
                    email, groups = line.strip().split(' ')
                    groups = set(groups.split(','))
                    self.email_map[email] = groups
        except IOError:
            pass

In the save method, we open the file in a context manager and write the file as a formatted string. Remember the newline character; Python doesn't add that for us. The load method first resets the dictionary (in case it contains data from a previous call to load) uses the for...in syntax, which loops over each line in the file. Again, the newline character is included in the line variable, so we have to call .strip() to take it off.

Before using these methods, we need to make sure the object has a self.data_file attribute, which can be done by modifying __init__:

    def __init__(self, data_file):
        self.data_file = data_file
        self.email_map = defaultdict(set)

We can test these two methods in the interpreter as follows:

>>> m = MailingList('addresses.db')
>>> m.add_to_group('[email protected]', 'friends')
>>> m.add_to_group('[email protected]', 'friends')
>>> m.add_to_group('[email protected]', 'family')
>>> m.save()

The resulting addresses.db file contains the following lines, as expected:

[email protected] friends
[email protected] friends,family

We can also load this data back into a MailingList object successfully:

>>> m = MailingList('addresses.db')
>>> m.email_map
defaultdict(<class 'set'>, {})
>>> m.load()
>>> m.email_map
defaultdict(<class 'set'>, {'[email protected]': {'friends
'}, '[email protected]': {'family
'}, '[email protected]': {'friends
'}})

As you can see, I forgot to do the load command, and it might be easy to forget the save command as well. To make this a little easier for anyone who wants to use our MailingList API in their own code, let's provide the methods to support a context manager:

    def __enter__(self):
        self.load()
        return self

    def __exit__(self, type, value, tb):
        self.save()

These simple methods just delegate their work to load and save, but we can now write code like this in the interactive interpreter and know that all the previously stored addresses were loaded on our behalf, and that the whole list will be saved to the file when we are done:

>>> with MailingList('addresses.db') as ml:
...    ml.add_to_group('[email protected]', 'friends')
...    ml.send_mailing("What's up", "hey friends, how's it going", '[email protected]', 'friends')
Case study
Case study
Case study
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset