Credit: Guido van Rossum, creator of Python
Network programming is one of my favorite Python
applications. I wrote or started most of the network modules in the
Python Standard Library, including the socket
and select
extension modules and most of the
protocol client modules (such as ftplib
). I also wrote a popular server
framework module, SocketServer
, and
two web browsers in Python, the first predating Mosaic. Need I say
more?
Python’s roots lie in a distributed operating system, Amoeba, which I helped design and implement in the late 1980s. Python was originally intended to be the scripting language for Amoeba, since it turned out that the Unix shell, while ported to Amoeba, wasn’t very useful for writing Amoeba system administration scripts. Of course, I designed Python to be platform independent from the start. Once Python was ported from Amoeba to Unix, I taught myself BSD socket programming by wrapping the socket primitives in a Python extension module and then experimenting with them using Python; this was one of the first extension modules.
This approach proved to be a great early testimony of Python’s
strengths. Writing socket code in C is tedious: the code necessary to do
error checking on every call quickly overtakes the logic of the program.
Quick: in which order should a server call accept
, bind
, connect
, and listen
? This is remarkably difficult to find
out if all you have is a set of Unix manpages. In Python, you don’t have
to write separate error-handling code for each call, making the logic of
the code stand out much clearer. You can also learn about sockets by
experimenting in an interactive Python shell, where misconceptions about
the proper order of calls and the argument values that each call
requires are cleared up quickly through Python’s immediate error
messages.
Python has come a long way since those first days, and now few
applications use the socket
module
directly; most use much higher-level modules such as urllib
or smtplib
, and third-party extensions such as
the Twisted framework, whose popularity keeps growing. The examples in
this chapter are a varied bunch: some construct and send complex email
messages, while others dwell on lower-level issues such as tunneling. My
favorite is Recipe
13.11, which implements PyHeartBeat
: it’s useful, it uses the socket
module, and it’s simple enough to be an
educational example. I do note, with that mixture of pride and sadness
that always accompanies a parent’s observation of children growing up,
that, since the Python Cookbook’s first edition,
even PyHeartBeat
has acquired an
alternative server implementation based on Twisted!
Nevertheless, my own baby, the socket
module itself, is still the foundation
of all network operations in Python. It’s a plain transliteration of the
socket APIs—first introduced in BSD Unix and now widespread on all
platforms—into the object-oriented paradigm. You create socket objects
by calling the socket.socket
factory
function, then you call methods on these objects to perform typical
low-level network operations. You don’t have to worry about allocating
and freeing memory for buffers and the like—Python handles that for you
automatically. You express IP addresses as (host, port)
pairs, in which host
is a string in either dotted-quad
('1.2.3.4
') or domain-name ('www.python.org
') notation. As you can see,
even low-level modules in Python aren’t as low level as all that.
Despite the various conveniences, the socket
module still exposes the actual
underlying functionality of your operating system’s network sockets. If
you’re at all familiar with sockets, you’ll quickly get the hang of
Python’s socket
module, using
Python’s own Library Reference. You’ll then be
able to play with sockets interactively in Python to become a socket
expert, if that is what you want. The classic, highly recommended work
on this subject is W. Richard Stevens, UNIX Network
Programming, Volume 1: Networking APIs - Sockets and XTI, 2d
ed. (Prentice-Hall). For many practical uses, however,
higher-level modules will serve you better.
The Internet uses a sometimes dazzling variety of
protocols and formats, and the Python Standard Library supports many of
them. In the Python Standard Library, you will find dozens of modules
dedicated to supporting specific Internet protocols (such as smtplib
to support the SMTP protocol to send
mail and nntplib
to support the
Network News Transfer Protocol (NNTP) to send and receive Network News).
In addition, you’ll find about as many modules that support specific
Internet formats (such as htmllib
to
parse HTML data, the email
package to
parse and compose various formats related to email—including attachments
and encoding).
I cannot even come close to doing justice to the powerful array of
tools mentioned in this introduction, nor will you find all of these
modules and packages used in this chapter, nor in this book, nor in most
programming shops. You may never need to write any program that deals
with Network News, for example; if that is the case, you don’t need to
study nntplib
. But it is still
reassuring to know it’s there (part of the “batteries included” approach
of the Python Standard Library).
Two higher-level modules that stand out from the crowd,
however, are urllib
and urllib2
. Each of these two modules can deal
with several protocols through the magic of URLs—those now-familiar
strings, such as http://www.python.org/index.html, that identify
a protocol (such as http), a host and port (such as www.python.org, port 80 being
the default for the HTTP protocol), and a specific resource at that
address (such as /index.html).
urllib
is very simple to use, but
urllib2
is more powerful and
extensible. HTTP is the most popular protocol for URLs, but these
modules also support several others, such as FTP. In many cases, you’ll
be able to use these modules to write typical client-side scripts that
interact with any of the supported protocols much quicker and with less
effort than it might take with the various protocol-specific
modules.
To illustrate, I’d like to conclude with a cookbook example of my own. It’s similar to Recipe 13.2, but, rather than a program fragment, it’s a little script. I call it wget.py because it does everything for which I’ve ever needed wget. (In fact, I originally wrote this script on a system where wget wasn’t installed but Python was; writing wget.py was a more effective use of my time than downloading and installing the real thing.)
import sys, urllib def reporthook(*a): print a for url in sys.argv[1:]: i = url.rfind('/') file = url[i+1:] print url, "->", file urllib.urlretrieve(url, file, reporthook)
Pass this script one or more URLs as command-line arguments; the script retrieves them into local files whose names match the last components of the URLs. The script also prints progress information of the form:
(block number, block size, total size)
Obviously, it’s easy to improve on this script; but it’s only seven lines, it’s readable, and it works—and that’s what’s so cool about Python.
Another cool thing about Python is that you can incrementally improve a program like this, and after it’s grown by two or three orders of magnitude, it’s still readable, and it still works! To see what this particular example might evolve into, check out Tools/webchecker/websucker.py in the Python source distribution. Enjoy!
Credit: Jeff Bauer
You want to communicate small messages between machines on a network in a lightweight fashion, without needing absolute assurance of reliability.
This task is just what the UDP protocol is for, and Python makes it easy for you to access UDP via datagram sockets. You can write a UDP server script (server.py) as follows:
import socket port = 8081 s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) # Accept UDP datagrams, on the given port, from any sender s.bind(("", port)) print "waiting on port:", port while True: # Receive up to 1,024 bytes in a datagram data, addr = s.recvfrom(1024) print "Received:", data, "from", addr
You can write a corresponding UDP client script (client.py) as follows:
import socket port = 8081 host = "localhost" s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) s.sendto("Holy Guido! It's working.", (host, port))
Sending short text messages with socket datagrams is simple to implement and provides a lightweight message-passing idiom. Socket datagrams should not be used, however, when reliable delivery of data must be guaranteed. If the server isn’t available, your message is lost. However, in many situations, you won’t care whether the message gets lost, or, at least, you do not want to abort a program just because a message can’t be delivered.
Note that the sender of a UDP datagram (the “client” in this
example) does not bind the socket before calling the sendto
method. On the other hand, to receive
UDP datagrams, the socket does have to be bound before calling the
recvfrom
method.
Don’t use this recipe’s simple code to send large datagram messages, especially under Windows, which may not respect the buffer limit. To send larger messages, you may want to do something like this:
BUFSIZE = 1024 while msg: bytes_sent = s.sendto(msg[:BUFSIZE], (host, port)) msg = msg[bytes_sent:]
The sendto
method returns the
number of bytes it has actually managed to send, so each time, you
retry from the point where you left off, while ensuring that no more
than BUFSIZE
octets are sent in
each datagram.
Note that with datagrams (UDP) you have no guarantee that all
(or any) of the pieces that you send as separate datagrams arrive to
the destination, nor that the pieces that do arrive are in the same
order in which they were sent. If you need to worry about any of these
reliability issues, you may be better off with a TCP connection, which
gives you all of these assurances and handles many delicate
behind-the-scenes aspects nicely on your behalf. Still, I often use
socket datagrams for debugging, especially (but not exclusively) where
an application spans more than one machine on the same, reliable local
area network. The Python Standard Library’s logging
module also supports optional use of
UDP for its logging output.
Recipe 13.11
for a typical, useful application of UDP datagrams in network
operations; documentation for the standard library modules socket
and logging
in the Library
Reference and Python in a
Nutshell.
Credit: Gisle Aas, Magnus Bodin
urllib.urlopen
returns a
file-like object, and you can call the read
method on that object to get all of its
contents:
from urllib import urlopen doc = urlopen("http://www.python.org").read( ) print doc
Once you obtain a file-like object from urlopen
, you can read it all at once into
one big string by calling its read
method, as I do in this recipe. Alternatively, you can read the object
as a list of lines by calling its readlines
method, or, for special purposes,
just get one line at a time by looping over the object in a for
loop. In addition to these file-like
operations, the object that urlopen
returns offers a few other useful features. For example, the following
snippet gives you the headers of the document:
doc = urlopen("http://www.python.org") print doc.info( )
such as the Content-Type
header (text/html
in this case)
that defines the MIME type of the document. doc.info
returns a mimetools.Message
instance, so you can
access it in various ways besides printing it or otherwise
transforming it into a string. For example, doc.info( ).getheader(`Content-Type')
returns the 'text/html
' string. The
maintype
attribute of the mimetools.Message
object is the 'text
' string, subtype
is the 'html
' string, and type
is also the 'text/html
' string. If you need to perform
sophisticated analysis and processing, all the tools you need are
right there. At the same time, if your needs are simpler, you can meet
them in very simple ways, as this recipe shows.
If what you need to do with the document you grab from the Web
is specifically to save it to a local file, urllib.urlretrieve
is just what you need, as
the “Introduction” to this chapter describes.
urllib
implicitly supports
the use of proxies (as long as the proxies do not require
authentication: the current implementation of urllib
does not support
authentication-requiring proxies). Just set environment variable
HTTP_PROXY
to a URL, such as
'http://proxy.domain.com:8080
', to
use the proxy at that URL. If the environment variable HTTP_PROXY
is not set, urllib
may also look for the information in
other platform-specific locations, such as the Windows registry if
you’re running under Windows.
If you have more advanced needs, such as using proxies that
require authentication, you may use the more sophisticated urllib2
module of the Python Standard
Library, rather than simple module urllib
. At http://pydoc.org/2.3/urllib2.html, you can
find an example of how to use urllib2
for the specific task of accessing
the Internet through a proxy that does require authentication.
Documentation for the standard library modules urllib
, urllib2
, and mimetools
in the Library
Reference and Python in a
Nutshell.
Credit: Mark Nenadov
Several of the FTP sites on your list of sites could be down at any time. You want to filter that list and obtain the list of those sites that are currently up.
Clearly, we first need a function to check whether one particular site is up:
import socket, ftplib def isFTPSiteUp(site): try: ftplib.FTP(site).quit( ) except socket.error: return False else: return True
Now, a simple list comprehension can perform the recipe’s task, but we may as well wrap that list comprehension inside another function:
def filterFTPsites(sites): return [site for site in sites if isFTPSiteUp(site)]
Alternatively, filter(isFTPSiteUp,
sites)
returns exactly the same resulting list as the list
comprehension.
Lists of FTP sites are sometimes difficult to maintain, since sites may be closed or temporarily down for all sorts of reasons. The code in this recipe is simple and suitable, for example, for use inside a small interactive program that must let the user choose among FTP sites—we may as well not even present for choice those sites we know are down! If you run this code regularly a few times a day and append the results to a file, the results may also be a basis for long-term maintenance of a list of FTP sites. Any site that has been down for more than a certain number of days should probably be moved away from the main list and into a list of sites that may well have croaked.
Very similar ideas could be used to filter lists of sites that
serve protocols other than FTP, by using, instead of standard Python
library module ftplib
, other such
modules, such as nntplib
for the
NNTP protocol, httplib
for the
Hypertext Transport Protocol (HTTP), and so on.
When you’re checking many FTP sites within one program run, it could be much faster to use multiple threads to check on multiple sites at once (so that the delays while waiting for the various sites to respond can overlap), or else use an asynchronous approach. The simple approach presented in this recipe is easiest to program and to understand, but for most real-life networking programs, you do want to enhance performance by using either multithreading or asynchronous approaches, as other recipes in this chapter demonstrate.
Documentation for the standard library modules socket
, ftplib
, nntplib
, and httplib
, and built-in function filter
, in the Library
Reference and Python in a
Nutshell.
Credit: Simon Foster
You need to contact an SNTP (Simplified Network Time Protocol) server (which respects RFC 2030) to obtain the time of day as returned by that server.
SNTP is quite simple to implement, for example in a small script:
import socket, struct, sys, time TIME1970 = 2208988800L # Thanks to F.Lundh client = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) data = 'x1b' + 47 * ' ' client.sendto(data, (sys.argv[1], 123)) data, address = client.recvfrom(1024) if data: print 'Response received from:', address t = struct.unpack('!12I', data)[10] t -= TIME1970 print ' Time=%s' % time.ctime(t)
An SNTP exchange begins with a client sending a 48-byte
UDP datagram which starts with byte 'x1b
‘. The server answers with a 48-byte UDP
datagram made up of twelve network-order longwords (4 bytes each). We
can easily unpack the server’s returned datagram into a tuple
of int
s, by using standard Python library
module struct
’s unpack
function. Then, for simplicity, we
look only at the eleventh of those twelve longwords. That integer
gives the time in seconds—but it measures time from an epoch that’s
different from the 1970-based one normally used in Python. The
difference in epochs is easily fixed by subtracting the
magic number (kindly supplied by F. Lundh) that
is named TIME1970
in the recipe. After the
subtraction, we have a time in seconds from the epoch that complies
with Python’s standard time
module,
and we can handle it with the functions in module time
. In this recipe, we just display it on
standard output as formatted by function time.ctime
.
Documentation for the standard library modules socket
, struct
and time
in the Library
Reference and Python in a Nutshell;
the SNTP protocol is defined in RFC 2030 (http://www.ietf.org/rfc/rfc2030.txt), and the
richer NTP protocol is defined in RFC 1305 (http://www.ietf.org/rfc/rfc1305.txt); Chapter 3 for general issues
dealing with time in Python.
Credit: Art Gillespie
You need to send HTML mail and accompany it with a plain text version of the message’s contents, so that the email message is also readable by MUAs that are not HTML-capable.
Although the modern Python way to perform any mail manipulation
is with the standard Python library email
package, the functionality we need for
this recipe is also supplied by the MimeWriter
and mimetools
modules (which are also in the
Python Standard Library). We can easily code a function that just
accesses and uses that functionality:
def createhtmlmail(subject, html, text=None): " Create a mime-message that will render as HTML or text, as appropriate" import MimeWriter, mimetools, cStringIO if text is None: # Produce an approximate textual rendering of the HTML string, # unless you have been given a better version as an argument import htmllib, formatter textout = cStringIO.StringIO( ) formtext = formatter.AbstractFormatter(formatter.DumbWriter(textout)) parser = htmllib.HTMLParser(formtext) parser.feed(html) parser.close( ) text = textout.getvalue( ) del textout, formtext, parser out = cStringIO.StringIO( ) # output buffer for our message htmlin = cStringIO.StringIO(html) # input buffer for the HTML txtin = cStringIO.StringIO(text) # input buffer for the plain text writer = MimeWriter.MimeWriter(out) # Set up some basic headers. Place subject here because smtplib.sendmail # expects it to be in the message, as relevant RFCs prescribe. writer.addheader("Subject", subject) writer.addheader("MIME-Version", "1.0") # Start the multipart section of the message. Multipart/alternative seems # to work better on some MUAs than multipart/mixed. writer.startmultipartbody("alternative") writer.flushheaders( ) # the plain-text section: just copied through, assuming iso-8859-1 subpart = writer.nextpart( ) pout = subpart.startbody("text/plain", [("charset", 'iso-8859-1')]) pout.write(txtin.read( )) txtin.close( ) # the HTML subpart of the message: quoted-printable, just in case subpart = writer.nextpart( ) subpart.addheader("Content-Transfer-Encoding", "quoted-printable") pout = subpart.startbody("text/html", [("charset", 'us-ascii')]) mimetools.encode(htmlin, pout, 'quoted-printable') htmlin.close( ) # You're done; close your writer and return the message as a string writer.lastpart( ) msg = out.getvalue( ) out.close( ) return msg
This recipe’s module is completed in the usual style with a few lines to ensure that, when run as a script, it runs a self-test by composing and sending a sample HTML mail:
if _ _name_ _=="_ _main_ _": import smtplib f = open("newsletter.html", 'r') html = f.read( ) f.close( ) try: f = open("newsletter.txt", 'r') text = f.read( ) except IOError: text = None subject = "Today's Newsletter!" message = createhtmlmail(subject, html, text) server = smtplib.SMTP("localhost") server.sendmail('[email protected]', '[email protected]', message) server.quit( )
Sending HTML mail is a popular concept, and (as long as you avoid sending it to newsgroups and open mailing lists) there’s no reason your Python scripts shouldn’t do it. When you do send HTML mail, never forget to embed a text-only version of your message along with the HTML version. Lots of folks still prefer character-mode mail readers (technically known as MUAs), and it makes no sense to alienate those users by sending mail that they can’t conveniently read. This recipe shows how easy Python makes the task of sending an email in both HTML and text forms.
Ideally, your input will be a properly formatted text version of
the message, as well as the HTML version. But, if you don’t have such
nice textual input, you can still prepare a text version on the fly
starting from the HTML version; one way to prepare such text is shown
in the recipe. Remember that htmllib
has some limitations, so you may
want to use alternative approaches, such as saving the HTML string to
disk and then using:
text = os.popen('lynx -dump %s' % tempfile).read( )
or whatever works best for you. Alternatively, if all you have as input is plain text (following some specific conventions, such as empty lines to mark paragraphs and underlines for emphasis), you can parse the text and throw together some HTML markup on the fly.
The emails generated by this code have been successfully read on Outlook 2000, Eudora 4.2, Hotmail, and Netscape Mail. It’s likely that they will work in other HTML-capable MUAs as well. Mutt has been used to test the acceptance of messages generated by this recipe in text-only MUAs. Again, other such MUAs can be expected to work just as acceptably.
Recipe 13.6
shows how the email
package in the
Python Standard Library can also be used to compose a MIME multipart
message; documentation in the Library Reference
and Python in a Nutshell about the standard
library package email
, as well as
modules mimetools
, MimeWriter
, htmllib
, formatter
, cStringIO
, and smtplib
; Henry Minsky’s article about MIME
(http://www.arsdigita.com/asj/mime/) for
information on various issues related to sending HTML mail.
Credit: Matthew Dixon Cowles, Hans Fangohr, John Pywtorak
You want to create a multipart MIME (Multipurpose Internet Mail Extensions) message that includes all files in the current directory.
If you often deal with composing or parsing mail
messages, or mail-like messages such as Usenet news posts, the Python
Standard Library email
package
gives you very powerful tools to work with. Here is a module that uses
email
to solve the task posed in
the “Problem”:
#!/usr/bin/env python import base64, quopri import mimetypes, email.Generator, email.Message import cStringIO, os # sample addresses toAddr = "[email protected]" fromAddr = "[email protected]" outputFile = "dirContentsMail" def main( ): mainMsg = email.Message.Message( ) mainMsg["To"] = toAddr mainMsg["From"] = fromAddr mainMsg["Subject"] = "Directory contents" mainMsg["Mime-version"] = "1.0" mainMsg["Content-type"] = "Multipart/mixed" mainMsg.preamble = "Mime message " mainMsg.epilogue = "" # to ensure that message ends with newline # Get names of plain files (not subdirectories or special files) fileNames = [f for f in os.listdir(os.curdir) if os.path.isfile(f)] for fileName in fileNames: contentType, ignored = mimetypes.guess_type(fileName) if contentType is None: # If no guess, use generic opaque type contentType = "application/octet-stream" contentsEncoded = cStringIO.StringIO( ) f = open(fileName, "rb") mainType = contentType[:contentType.find("/")] if mainType=="text": cte = "quoted-printable" quopri.encode(f, contentsEncoded, 1) # 1 to also encode tabs else: cte = "base64" base64.encode(f, contentsEncoded) f.close( ) subMsg = email.Message.Message( ) subMsg.add_header("Content-type", contentType, name=fileName) subMsg.add_header("Content-transfer-encoding", cte) subMsg.set_payload(contentsEncoded.getvalue( )) contentsEncoded.close( ) mainMsg.attach(subMsg) f = open(outputFile, "wb") g = email.Generator.Generator(f) g.flatten(mainMsg) f.close( ) return None if _ _name_ _=="_ _main_ _": main( )
The email
package makes
manipulating MIME messages a snap. The Python Standard Library also
offers other older modules that can serve many of the same purposes,
but I suggest you look into email
as an alternative to all such other modules. email
requires some study because it is a
very functionally rich package, but it will amply repay the time you
spend studying it.
MIME is the Internet standard for sending files and non-ASCII data by email. The standard is specified in RFCs 2045-2049. A few points are especially worth keeping in mind:
The original specification for the format of an email (RFC 822) didn’t allow for non-ASCII characters and had no provision for attaching or enclosing a file along with a text message. Therefore, not surprisingly, MIME messages are very common these days.
Messages that follow the MIME standard are backward compatible with ordinary RFC 822 (now RFC 2822) messages. An old mail reader (technically, an MUA) that doesn’t understand the MIME specification will probably not be able to display a MIME message in a way that’s useful to the user, but the message will still be legal and therefore shouldn’t cause unexpected behavior.
An RFC 2822 message consists of a set of headers, a blank line, and a body. MIME handles attachments and other multipart documents by specifying a format for the message’s body. In multipart MIME messages, the body is divided into submessages, each of which has a set of headers, a blank line, and a body. Generally, each submessage is referred to as a MIME part, and parts may nest recursively.
MIME parts (whether or not in a multipart message) that contain characters outside of the strict US-ASCII range are encoded as either base-64 or quoted-printable data, so that the resulting mail message contains only ordinary ASCII characters. Data can be encoded with either method, but, generally, only data that has few non-ASCII characters (basically text, possibly with a few extra characters outside of the ASCII range, such as national characters in Latin-1 and similar codes) is worth encoding as quoted-printable, because even without decoding it may be readable. If the data is essentially binary, with all bytes being equally likely, base-64 encoding is more compact.
Not surprisingly, given all of these issues, manipulating MIME
messages is often considered to be a nuisance. In the old times, back
before Python 2.2, the standard library’s modules for dealing with
MIME messages were quite useful but rather miscellaneous. In
particular, putting MIME messages together and taking them apart
required two distinct approaches. The email
package, which was added in Python
2.2, unified and simplified these two related jobs.
Recipe 13.7
shows how the email
package can be
used to unpack a MIME message; documentation for the standard library
modules email
, mimetypes
, base64
, quopri
, and cStringIO
in the Library
Reference and Python in a
Nutshell.
Credit: Matthew Cowles
The walk
method of message
objects generated by the email
package makes this task really easy. Here is a script that uses
email
to solve the task posed in
the “Problem”:
import email.Parser import os, sys def main( ): if len(sys.argv) != 2: print "Usage: %s filename" % os.path.basename(sys.argv[0]) sys.exit(1) mailFile = open(sys.argv[1], "rb") p = email.Parser.Parser( ) msg = p.parse(mailFile) mailFile.close( ) partCounter = 1 for part in msg.walk( ): if part.get_main_type( ) == "multipart": continue name = part.get_param("name") if name == None: name = "part-%i" % partCounter partCounter += 1 # In real life, make sure that name is a reasonable filename # for your OS; otherwise, mangle that name until it is! f = open(name, "wb") f.write(part.get_payload(decode=1)) f.close( ) print name if _ _name_ _=="_ _main_ _": main( )
The email
package makes
parsing MIME messages reasonably easy. This recipe shows how to
unbundle a MIME
message with the
email
package by using the walk
method of message objects.
You can create a message object in several ways. For example,
you can instantiate the email.Message.Message
class and build the
message object’s contents with calls to its methods. In this recipe,
however, I need to read and analyze an existing message, so I work the
other way around, calling the parse
method of an email.Parser.Parser
instance. The parse
method takes as
its only argument a file-like object (in the recipe, I pass it a real
file object that I just opened for binary reading with the built-in
open
function) and returns a
message object, on which you can call message object methods.
The walk
method is a
generator (i.e., it returns an iterator object on which you can loop
with a for
statement). You usually
will use this method exactly as I use it in this recipe:
for part in msg.walk( ):
The iterator sequentially returns (depth-first, in case of
nesting) the parts that make up the message. If the message is not a
container of parts (i.e., has no attachments or alternates—message.is_multipart
returns false), no
problem: the walk
method will then
return an iterator with a single element—the message itself. In any
case, each element of the iterator is also a message object (an
instance of email.Message.Message
),
so you can call on it any of the methods that a message object
supplies.
In a multipart message, parts with a type of 'multipart/something
' (i.e., a main type of
'multipart
') may be present. In
this recipe, I skip them explicitly since they’re just glue holding
the true parts together. I use the get_main_type
method to obtain the main type
and check it for equality with 'multipart
'; if equality holds, I skip this
part and move to the next one with a continue
statement. When I know I have a
real part in hand, I locate its name (or synthesize one if it has no
name), open that name as a file, and write the message’s contents
(also known as the message’s payload), which I
get by calling the get_payload
method, into the file. I use the decode=1
argument to ensure that the payload
is decoded back to a binary content (e.g., an image, a sound file, a
movie) if needed, rather than remaining in text form. If the payload
is not encoded, decode=1
is
innocuous, so I don’t have to check before I pass it.
Recipe 13.6;
documentation for the standard library package email
in the Library
Reference.
Credit: Anthony Baxter
You’re handling email in Python and need to remove from email messages any attachments that might be dangerous.
Regular expressions can help us identify dangerous content types and file extensions, and thus code a function to remove any potentially dangerous attachments:
ReplFormat = """ This message contained an attachment that was stripped out. The filename was: %(filename)s, The original type was: %(content_type)s (and it had additional parameters of: %(params)s) """ import re BAD_CONTENT_RE = re.compile('application/(msword|msexcel)', re.I) BAD_FILEEXT_RE = re.compile(r'(.exe|.zip|.pif|.scr|.ps)$') def sanitise(msg): ''' Strip out all potentially dangerous payloads from a message ''' ct = msg.get_content_type( ) fn = msg.get_filename( ) if BAD_CONTENT_RE.search(ct) or (fn and BAD_FILEEXT_RE.search(fn)): # bad message-part, pull out info for reporting then destroy it # present the parameters to the content-type, list of key, value # pairs, as key=value forms joined by comma-space params = msg.get_params( )[1:] params = ', '.join([ '='.join(p) for p in params ]) # put informative message text as new payload replace = ReplFormat % dict(content_type=ct, filename=fn, params=params) msg.set_payload(replace) # now remove parameters and set contents in content-type header for k, v in msg.get_params( )[1:]: msg.del_param(k) msg.set_type('text/plain') # Also remove headers that make no sense without content-type del msg['Content-Transfer-Encoding'] del msg['Content-Disposition'] else: # Now we check for any sub-parts to the message if msg.is_multipart( ): # Call sanitise recursively on any subparts payload = [ sanitise(x) for x in msg.get_payload( ) ] # Replace the payload with our list of sanitised parts msg.set_payload(payload) # Return the sanitised message return msg # Add a simple driver/example to show how to use this function if _ _name_ _ == '_ _main_ _': import email, sys m = email.message_from_file(open(sys.argv[1])) print sanitise(m)
This issue has come up a few times on the newsgroup comp.lang.python, so I decided to post a cookbook entry to show how easy it is to deal with this kind of task. Specifically, this recipe shows how to read in an email message, strip out any dangerous or suspicious attachments, and replace them with a harmless text message informing the user of the alterations that we’re performed.
This kind of task is particularly important when end users are using something like Microsoft Outlook, which is targeted by harmful virus and worm messages (collectively known as malware) on a daily basis.
The email parser in Python 2.4 has been completely rewritten to be robust first, correct second. Prior to that version, the parser was written for correctness first. But focusing on correctness was a problem because many virus/worm messages and other malware routinely send email messages that are broken and nonconformant—malformed to the point that the old email parser chokes and dies. The new parser is designed to never actually break when reading a message. Instead, it tries its best to fix whatever it can fix in the message. (If you have a message that causes the parser to crash, please let us, the core Python developers, know. It’s a bug, and we’ll fix it. Please include a copy of the message that makes the parser crash, or else it’s very unlikely that we can reproduce your problem!)
The recipe’s code itself is fairly well commented and should be
easy enough to follow. A mail message consists of one or more parts;
each of these parts can contain nested parts. We call the
sanitise
function on the top-level Message
object, and it calls itself
recursively on the subobjects if and as needed.
The sanitise
function
first checks the Content-Type
of
the part, and if there’s a filename, it also checks that filename’s
extension against a known-to-be-bad list. If the message part is bad,
we replace the message itself with a short text description describing
the now-removed part and clean out the headers that are relevant. We
set this message part’s Content-Type
to 'text/plain
' and remove other headers related
to the now-removed message.
Finally, we check whether the message is a multipart message. If
so, it means the message has subparts, so we recursively call the
sanitise
function on each of them. We then replace
the payload with our list of sanitized subparts.
If you’re interested in working further on this recipe, the most
important extra functionality, which is easy to add with a small
amount of work, might be to store the attached file in some directory
(instead of destroying all suspect attachments), and give the user a
link to that file. Also consider extending the check in
sanitise
that filters dangerous attachments to have
it verify more than just the content type and file extension; other
headers may be able to carry known signs of worm or virus
messages.
Documentation for the standard library modules email
and re
in the Library
Reference and Python in a
Nutshell.
You’re using Python 2.4’s new email.FeedParser
module, but sometimes, when
dealing with badly malformed incoming messages, that module produces
message objects that are internally inconsistent (e.g., a message has
a content-type header that says the message is multipart, but the body
isn’t), and you need to fix those inconsistencies.
Python 2.4’s new standard library module email.FeedParser
is very useful, but a
little post-processing on the messages it returns can heuristically
fix some inconsistencies and make it even better. Here’s a module
containing a class and a few functions to help with this task:
import email, email.FeedParser import re, sys, sgmllib # what chars are non-Ascii, what max fraction of them can be in a text part kGuessBinaryThreshold = 0.2 kGuessBinaryRE = re.compile("[\0000-\0025\0200-\0377]") # what max fraction of HTML tags can be in a text (non-HTML) part kGuessHTMLThreshold = 0.05 class Cleaner(sgmllib.SGMLParser): entitydefs = {"nbsp": " "} # I'll break if I want to def _ _init_ _(self): sgmllib.SGMLParser._ _init_ _(self) self.result = [ ] def do_p(self, *junk): self.result.append(' ') def do_br(self, *junk): self.result.append(' ') def handle_data(self, data): self.result.append(data) def cleaned_text(self): return ''.join(self.result) def stripHTML(text): ''' return text, with HTML tags stripped ''' c = Cleaner( ) try: c.feed(text) except sgmllib.SGMLParseError: return text else: return c.cleaned_text( ) def guessIsBinary(text): ''' return whether we can heuristically guess 'text' is binary ''' if not text: return False nMatches = float(len(kGuessBinaryRE.findall(text))) return nMatches/len(text) >= kGuessBinaryThreshold def guessIsHTML(text): ''' return whether we can heuristically guess 'text' is HTML ''' if not text: return False lt = len(text) textWithoutTags = stripHTML(text) tagsChars = float(lt-len(textWithoutTags)) return tagsChars/lt >= kGuessHTMLThreshold def getMungedMessage(openFile): openFile.seek(0) p = email.FeedParser.FeedParser( ) p.feed(openFile.read( )) m = p.close( ) # Fix up multipart content-type when message isn't multi-part if m.get_main_type( )=="multipart" and not m.is_multipart( ): t = m.get_payload(decode=1) if guessIsBinary(t): # Use generic "opaque" type m.set_type("application/octet-stream") elif guessIsHTML(t): m.set_type("text/html") else: m.set_type("text/plain") return m
FeedParser
is a new module in the Python 2.4 Standard Library’s
email
package. The module’s name
comes from the fact that it maintains a buffer, so that you don’t have
to give it all the text at once. Possibly more interesting is that the
module doesn’t raise an error when called on malformed messages;
instead, it tries to make some sense of them and return a useful
email.Message
object. That’s useful
because so much mail is spam and so much spam is malformed.
The other side of the coin, given that the heroic feed parser
works on incorrect messages, is that you can get back from it an
email.Message
object that’s
internally inconsistent. This recipe tries to make sense of one kind
of inconsistency: a message with a content-type header that says that
the message is multipart, but the body isn’t.
The heuristics that the recipe uses to guess at the correct content-type are inevitably messy. Still, better to have such messy heuristics in recipes, rather than embedded forever in the Python Standard Library.
Documentation for the standard library package email
in the Python 2.4 Library
Reference.
Credit: Xavier Defrang
You have a POP3 mailbox somewhere, perhaps on a slow connection, and need to examine messages and possibly mark them for deletion interactively.
The poplib
module of
the Python Standard Library lets you write a script to solve this task
quite easily:
# Interactive script to clean POP3 mailboxes from malformed or too-large mails # # Iterates over nonretrieved mails, prints selected elements from the headers, # prompts interactively about whether each message should be deleted import sys, getpass, poplib, re # Change according to your needs: POP host, userid, and password POPHOST = "pop.domain.com" POPUSER = "jdoe" POPPASS = "" # How many lines to retrieve from body, and which headers to retrieve MAXLINES = 10 HEADERS = "From To Subject".split( ) args = len(sys.argv) if args>1: POPHOST = sys.argv[1] if args>2: POPUSER = sys.argv[2] if args>3: POPPASS = sys.argv[3] if args>4: MAXLINES= int(sys.argv[4]) if args>5: HEADERS = sys.argv[5:] # An RE to identify the headers you're actually interested in rx_headers = re.compile('|'.join(headers), re.IGNORECASE) try: # Connect to the POP server and identify the user pop = poplib.POP3(POPHOST) pop.user(POPUSER) # Authenticate user if not POPPASS or POPPASS=='=': # If no password was supplied, ask for the password POPPASS = getpass.getpass("Password for %s@%s:" % (POPUSER, POPHOST)) pop.pass_(POPPASS) # Get and print some general information (msg_count, box_size) stat = pop.stat( ) print "Logged in as %s@%s" % (POPUSER, POPHOST) print "Status: %d message(s), %d bytes" % stat bye = False count_del = 0 for msgnum in range(1, 1+stat[0]): # Retrieve headers response, lines, bytes = pop.top(msgnum, MAXLINES) # Print message info and headers you're interested in print "Message %d (%d bytes)" % (msgnum, bytes) print "-" * 30 print " ".join(filter(rx_headers.match, lines)) print "-" * 30 # Input loop while True: k = raw_input("(d=delete, s=skip, v=view, q=quit) What? ") k = k[:1].lower( ) if k == 'd': # Mark message for deletion k = raw_input("Delete message %d? (y/n) " % msgnum) if k in "yY": pop.dele(msgnum) print "Message %d marked for deletion" % msgnum count_del += 1 break elif k == 's': print "Message %d left on server" % msgnum break elif k == 'v': print "-" * 30 print " ".join(lines) print "-" * 30 elif k == 'q': bye = True break # Time to say goodbye? if bye: print "Bye" break # Summary print "Deleting %d message(s) in mailbox %s@%s" % ( count_del, POPUSER, POPHOST) # Commit operations and disconnect from server print "Closing POP3 session" pop.quit( ) except poplib.error_proto, detail: # Fancy error handling print "POP3 Protocol Error:", detail
Sometimes your POP3 mailbox is behind a slow Internet link, and you don’t want to wait for that funny 10MB MPEG movie that you already received twice yesterday to be fully downloaded before you can read your mail. Or maybe a peculiar malformed message is hanging your MUA. Issues of this kind are best tackled interactively, but you need a helpful script to let you examine data about each message and determine which messages should be removed.
I used to deal with this kind of thing by telneting to the POP
(Post Office Protocol) server and trying to remember the POP3 protocol
commands (while hoping that the server implements the help command in particular). Nowadays, I use
the script presented in this recipe to inspect my mailbox and do some
cleaning. Basically, the Python Standard Library POP3 module, poplib
, remembers the protocol commands on
my behalf, and this script helps me use those commands
appropriately.
The script in this recipe uses the poplib
module to connect to your mailbox. It
then prompts you about what to do with each undelivered message. You
can view the top of the message, leave the message on the server, or
mark the message for deletion. No particular tricks or hacks are used
in this piece of code: it’s a simple example of poplib
usage. In addition to being
practically useful in emergencies, it can show you how poplib
works. The poplib.POP3
call returns an object that is
ready for connection to a POP3 server specified as its argument. We
complete the connection by calling the user
and pass_
methods to specify a user ID and
password. Note the trailing underscore in pass_
: this method could not be called
pass
because that is a Python
keyword (the do-nothing statement), and by convention, such issues are
often solved by appending an underscore to the identifier.
After connection, we keep working with methods of the
pop
object. The stat
method returns the number of messages
and the total size of the mailbox in bytes. The top
method takes a message-number argument
and returns information about that message, as well as the message
itself as a list of lines. (You can specify a second argument
n to ensure that no more than
n lines are returned.) The dele
method also takes a message-number
argument and deletes that message from the mailbox (without
renumbering all other messages). When we’re done, we call the quit
method. If you’re familiar with the
POP3 protocol, you’ll notice the close correspondence between these
methods and the POP3 commands.
Documentation for the standard library modules poplib
and getpass
in the Library
Reference and Python in a Nutshell;
the POP protocol is described in RFC 1939 (http://www.ietf.org/rfc/rfc1939.txt).
Credit: Nicola Larosa
You need to monitor the working state of a number of computers connected to a TCP/IP network.
The key idea in this recipe is to have every computer periodically send a heartbeat UDP packet to a computer acting as the server for this heartbeat-monitoring service. The server keeps track of how much time has passed since each computer last sent a heartbeat and reports on computers that have been silent for too long.
Here is the “client” program, HeartbeatClient.py, which must run on every computer we need to monitor:
""" Heartbeat client, sends out a UDP packet periodically """ import socket, time SERVER_IP = '192.168.0.15'; SERVER_PORT = 43278; BEAT_PERIOD = 5 print 'Sending heartbeat to IP %s , port %d' % (SERVER_IP, SERVER_PORT) print 'press Ctrl-C to stop' while True: hbSocket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) hbSocket.sendto('PyHB', (SERVER_IP, SERVER_PORT)) if _ _debug_ _: print 'Time: %s' % time.ctime( ) time.sleep(BEAT_PERIOD)
The server program, which receives and keeps track of these
“heartbeats”, must run on the machine whose address is given as
SERVER_IP
in the “client” program. The server must
support concurrency, since many heartbeats from different computers
might arrive simultaneously. A server program has essentially two ways
to support concurrency: multithreading, or asynchronous operation.
Here is a multithreaded ThreadedBeatServer.py, using only modules
from the Python Standard Library:
""" Threaded heartbeat server """ import socket, threading, time UDP_PORT = 43278; CHECK_PERIOD = 20; CHECK_TIMEOUT = 15 class Heartbeats(dict): """ Manage shared heartbeats dictionary with thread locking """ def _ _init_ _(self): super(Heartbeats, self)._ _init_ _( ) self._lock = threading.Lock( ) def _ _setitem_ _(self, key, value): """ Create or update the dictionary entry for a client """ self._lock.acquire( ) try: super(Heartbeats, self)._ _setitem_ _(key, value) finally: self._lock.release( ) def getSilent(self): """ Return a list of clients with heartbeat older than CHECK_TIMEOUT """ limit = time.time( ) - CHECK_TIMEOUT self._lock.acquire( ) try: silent = [ip for (ip, ipTime) in self.items( ) if ipTime < limit] finally: self._lock.release( ) return silent class Receiver(threading.Thread): """ Receive UDP packets and log them in the heartbeats dictionary """ def _ _init_ _(self, goOnEvent, heartbeats): super(Receiver, self)._ _init_ _( ) self.goOnEvent = goOnEvent self.heartbeats = heartbeats self.recSocket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) self.recSocket.settimeout(CHECK_TIMEOUT) self.recSocket.bind(('', UDP_PORT)) def run(self): while self.goOnEvent.isSet( ): try: data, addr = self.recSocket.recvfrom(5) if data == 'PyHB': self.heartbeats[addr[0]] = time.time( ) except socket.timeout: pass def main(num_receivers=3): receiverEvent = threading.Event( ) receiverEvent.set( ) heartbeats = Heartbeats( ) receivers = [ ] for i in range(num_receivers): receiver = Receiver(goOnEvent=receiverEvent, heartbeats=heartbeats) receiver.start( ) receivers.append(receiver) print 'Threaded heartbeat server listening on port %d' % UDP_PORT print 'press Ctrl-C to stop' try: while True: silent = heartbeats.getSilent( ) print 'Silent clients: %s' % silent time.sleep(CHECK_PERIOD) except KeyboardInterrupt: print 'Exiting, please wait...' receiverEvent.clear( ) for receiver in receivers: receiver.join( ) print 'Finished.' if _ _name_ _ == '_ _main_ _': main( )
As an alternative, here is an asynchronous AsyncBeatServer.py program based on the powerful Twisted framework:
import time from twisted.application import internet, service from twisted.internet import protocol from twisted.python import log UDP_PORT = 43278; CHECK_PERIOD = 20; CHECK_TIMEOUT = 15 class Receiver(protocol.DatagramProtocol): """ Receive UDP packets and log them in the "client"s dictionary """ def datagramReceived(self, data, (ip, port)): if data == 'PyHB': self.callback(ip) class DetectorService(internet.TimerService): """ Detect clients not sending heartbeats for too long """ def _ _init_ _(self): internet.TimerService._ _init_ _(self, CHECK_PERIOD, self.detect) self.beats = { } def update(self, ip): self.beats[ip] = time.time( ) def detect(self): """ Log a list of clients with heartbeat older than CHECK_TIMEOUT """ limit = time.time( ) - CHECK_TIMEOUT silent = [ip for (ip, ipTime) in self.beats.items( ) if ipTime < limit] log.msg('Silent clients: %s' % silent) application = service.Application('Heartbeat') # define and link the silent clients' detector service detectorSvc = DetectorService( ) detectorSvc.setServiceParent(application) # create an instance of the Receiver protocol, and give it the callback receiver = Receiver( ) receiver.callback = detectorSvc.update # define and link the UDP server service, passing the receiver in udpServer = internet.UDPServer(UDP_PORT, receiver) udpServer.setServiceParent(application) # each service is started automatically by Twisted at launch time log.msg('Asynchronous heartbeat server listening on port %d ' 'press Ctrl-C to stop ' % UDP_PORT)
When a number of computers are connected by a TCP/IP network, we are often interested in monitoring their working state. The client and server programs presented in this recipe help you detect when a computer stops working, while having minimal impact on network traffic and requiring very little setup. Note that this recipe does not monitor the working state of single, specific services running on a machine, just that of the TCP/IP stack and the underlying operating system and hardware components.
This PyHeartBeat
approach is made up of two
files: a client program, HeartbeatClient.py, sends UDP packets to
the server, while a server program, either ThreadedBeatServer.py (using only modules
from the Python Standard Library to implement a multithreaded
approach) or AsyncBeatServer.py
(implementing an asynchronous approach based on the powerful Twisted
framework), runs on a central computer to listen for such packets and
detect inactive clients. Client programs, running on any number of
computers, periodically send UDP packets to the server program that
runs on the central computer. The server program, in either version,
dynamically builds a dictionary that stores the IP addresses of the
“client” computers and the timestamp of the last packet received from
each one. At the same time, the server program periodically checks the
dictionary, checking whether any of the timestamps are older than a
defined timeout, to identify clients that have been silent too
long.
In this kind of application, there is no need to use reliable TCP connections since the loss of a packet now and then does not produce false alarms, as long as the server-checking timeout is kept suitably larger than the “client"-sending period. Since we may have hundreds of computers to monitor, it is best to keep the bandwidth used and the load on the server at a minimum: we do this by periodically sending a small UDP packet, instead of setting up a relatively expensive TCP connection per client.
The packets are sent from each client every 5 seconds, while the server checks the dictionary every 20 seconds, and the server’s timeout defaults to 15 seconds. These parameters, along with the server IP number and port used, can be adapted to one’s needs.
In the threaded server, a small number of worker threads listen to the UDP packets coming from the “client"s, while the main thread periodically checks the recorded heartbeats. The shared data structure, a dictionary, must be locked and released at each access, both while writing and reading, to avoid data corruption on concurrent access. Such data corruption would typically manifest itself as intermittent, time-dependent bugs that are difficult to reproduce, investigate, and correct.
A very sound alternative to such meticulous use of locking
around access to a resource is to dedicate a specialized thread to
be the only one interacting with the resource (in this case, the
dictionary), while all other threads send work requests to the
specialized thread with a Queue.Queue
instance. A Queue
-based approach is more scalable when
per-resource locking gets too complicated to manage easily: Queue
is less bug-prone and, in
particular, avoids worries about deadlocks. See Recipe 9.3, Recipe 9.5, Recipe 9.4, and Recipe 11.9 for more
information about Queue
and
examples of using Queue
to
structure the architecture of a multithreaded program.
The Twisted server employs an asynchronous, event-driven model based on the Twisted framework (http://www.twistedmatrix.com/). The framework is built around a central “reactor” that dispatches events from a queue in a single thread, and monitors network and host resources. The user program is composed of short code fragments invoked by the reactor when dispatching the matching events. Such a working model guarantees that only one user code fragment is executing at any given time, eliminating at the root all problems of concurrent access to shared data structures. Asynchronous servers can provide excellent performance and scalability under very heavy loads, by avoiding the threading and locking overheads of multithreader servers.
The asynchronous server program presented in this recipe is composed of one application and two services, the UDPServer and the DetectorService, respectively. It is invoked at any command shell by means of the twistd command, with the following options:
$ twistd -ony AsyncBeatServer.py
The twistd command controls
the reactor, and many other delicate facets of a server’s operation,
leaving the script it loads the sole responsibility of defining a
global variable named application
, implementing the needed
services, and connecting the service objects to the application
object.
Normally, twistd runs as a daemon and logs to a file (or to other logging facilities, depending on configuration options), but in this case, with the -ony flags, we’re specifically asking twistd to run in the foreground and with logging to standard output, so we can better see what’s going on. Note that the most popular file extension for scripts to be loaded by twistd is .tac, although in this recipe I have used the more generally familiar extension .py. The choice of file extension is just a convention, in this case: twistd can work with Python source files with any file extension, since you pass the full filename, extension included, as an explicit command-line argument anyway.
Documentation for the standard library modules socket
, threading
, Queue
and time
in the Library
Reference and Python in a Nutshell;
twisted
is at http://www.twistedmatrix.com;
Jeff Bauer has a related program, known as Mr. Creosote
(http://starship.python.net/crew/jbauer/creosote/),
using UDP for logging information; UDP is described in depth in W.
Richard Stevens, UNIX Network Programming, Volume 1:
Networking APIs-Sockets and XTI, 2d ed. (Prentice-Hall);
for the truly curious, the UDP protocol is defined in the two-page RFC
768 (http://www.ietf.org/rfc/rfc768.txt), which,
when compared with current RFCs, shows how much the Internet
infrastructure has evolved in 20 years.
Credit: Magnus Lyckå
The Python Standard Library BaseHTTPServer
module makes it easy to
implement special-purpose HTTP servers. For example, here is a
special-purpose HTTP server program that runs local commands on the
server host to get the data for replies to each GET
request:
import BaseHTTPServer, shutil, os from cStringIO import StringIO class MyHTTPRequestHandler(BaseHTTPServer.BaseHTTPRequestHandler): # HTTP paths we serve, and what commandline-commands we serve them with cmds = {'/ping': 'ping www.thinkware.se', '/netstat' : 'netstat -a', '/tracert': 'tracert www.thinkware.se', '/srvstats': 'net statistics server', '/wsstats': 'net statistics workstation', '/route' : 'route print', } def do_GET(self): """ Serve a GET request. """ f = self.send_head( ) if f: f = StringIO( ) machine = os.popen('hostname').readlines( )[0] if self.path == '/': heading = "Select a command to run on %s" % (machine) body = (self.getMenu( ) + "<p>The screen won't update until the selected " "command has finished. Please be patient.") else: heading = "Execution of ``%s'' on %s" % ( self.cmds[self.path], machine) cmd = self.cmds[self.path] body = '<a href="/">Main Menu</a><pre>%s</pre> ' % os.popen(cmd).read( ) # Translation CP437 -> Latin 1 needed for Swedish Windows. body = body.decode('cp437').encode('latin1') f.write("<html><head><title>%s</title></head> " % heading) f.write('<body><H1>%s</H1> ' % (heading)) f.write(body) f.write('</body></html> ') f.seek(0) self.copyfile(f, self.wfile) f.close( ) return f def do_HEAD(self): """ Serve a HEAD request. """ f = self.send_head( ) if f: f.close( ) def send_head(self): path = self.path if not path in ['/'] + self.cmds.keys( ): head = 'Command "%s" not found. Try one of these:<ul>' % path msg = head + self.getMenu( ) self.send_error(404, msg) return None self.send_response(200) self.send_header("Content-type", 'text/html') self.end_headers( ) f = StringIO( ) f.write("A test %s " % self.path) f.seek(0) return f def getMenu(self): keys = self.cmds.keys( ) keys.sort( ) msg = [ ] for k in keys: msg.append('<li><a href="%s">%s => %s</a></li>' %( k, k, self.cmds[k])) msg.append('</ul>') return " ".join(msg) def copyfile(self, source, outputfile): shutil.copyfileobj(source, outputfile) def main(HandlerClass = MyHTTPRequestHandler, ServerClass = BaseHTTPServer.HTTPServer): BaseHTTPServer.test(HandlerClass, ServerClass) if _ _name_ _ == '_ _main_ _': main( )
The Python Standard Library module BaseHTTPServer
makes it easy to set up
custom web servers on an internal network. This way, you can run
commands on various machines by just visiting those servers with a
browser. The code in this recipe is Windows-specific, indeed specific
to the version of Windows normally run in Sweden, because it knows
about code page 437 providing the encoding for the various commands’
results. The commands themselves are Windows ones, but that’s just as
easy to customize for your own purposes as the encoding issue—for
example, using traceroute (the Unix
spelling of the command) instead of tracert (the way Windows spells it).
In this recipe, all substantial work is performed by external
commands invoked by os.popen
calls.
Of course, it would be perfectly feasible to satisfy some or all of
the requests by running actual Python code within the same process as
the web server. We would normally not worry about concurrency issues
for this kind of special-purpose, ad hoc, administrative server
(unlike most web servers): the scenario it’s intended to cover is one
system administrator sitting at her system and visiting, with her
browser, various machines on the network being
administered/monitored—concurrency is not really needed. If your
scenario is somewhat different so that you do need concurrency, then
multithreading and asynchronous operations, shown in several other
recipes, are your fundamental options.
Documentation for the standard library modules BaseHTTPServer
, shutil
, os
, and cStringIO
in the Library
Reference and Python in a
Nutshell.
You need to forward a network port to another host (forwarding), possibly to a different port number (redirecting).
Classes using the threading
and socket
modules can provide port
forwarding and redirecting:
import sys, socket, time, threading LOGGING = True loglock = threading.Lock( ) def log(s, *a): if LOGGING: loglock.acquire( ) try: print '%s:%s' % (time.ctime( ), (s % a)) sys.stdout.flush( ) finally: loglock.release( ) class PipeThread(threading.Thread): pipes = [ ] pipeslock = threading.Lock( ) def _ _init_ _(self, source, sink): Thread._ _init_ _(self) self.source = source self.sink = sink log('Creating new pipe thread %s ( %s -> %s )', self, source.getpeername( ), sink.getpeername( )) self.pipeslock.acquire( ) try: self.pipes.append(self) finally: self.pipeslock.release( ) self.pipeslock.acquire( ) try: pipes_now = len(self.pipes) finally: self.pipeslock.release( ) log('%s pipes now active', pipes_now) def run(self): while True: try: data = self.source.recv(1024) if not data: break self.sink.send(data) except: break log('%s terminating', self) self.pipeslock.acquire( ) try: self.pipes.remove(self) finally: self.pipeslock.release( ) self.pipeslock.acquire( ) try: pipes_left = len(self.pipes) finally: self.pipeslock.release( ) log('%s pipes still active', pipes_left) class Pinhole(threading.Thread): def _ _init_ _(self, port, newhost, newport): Thread._ _init_ _(self) log('Redirecting: localhost:%s -> %s:%s', port, newhost, newport) self.newhost = newhost self.newport = newport self.sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) self.sock.bind(('', port)) self.sock.listen(5) def run(self): while True: newsock, address = self.sock.accept( ) log('Creating new session for %s:%s', *address) fwd = socket.socket(socket.AF_INET, socket.SOCK_STREAM) fwd.connect((self.newhost, self.newport)) PipeThread(newsock, fwd).start( ) PipeThread(fwd, newsock).start( )
A short ending to this pinhole.py module, with the usual guard to
run this part only when pinhole
is run as a main
script rather than imported, lets us offer this recipe’s functionality
as a command-line script:
if _ _name_ _ == '_ _main_ _': print 'Starting Pinhole port forwarder/redirector' import sys # get the arguments, give help in case of errors try: port = int(sys.argv[1]) newhost = sys.argv[2] try: newport = int(sys.argv[3]) except IndexError: newport = port except (ValueError, IndexError): print 'Usage: %s port newhost [newport]' % sys.argv[0] sys.exit(1) # start operations sys.stdout = open('pinhole.log', 'w') Pinhole(port, newhost, newport).start( )
Port forwarding and redirecting can often come in handy when you’re operating a network, even a small one. Applications or other services, possibly not under your control, may be hardwired to connect to servers on certain addresses or ports; by interposing a forwarder and redirector, you can send such applications’ connection requests onto any other host and/or port that suits you better.
The code in this recipe supplies two classes that liberally use
threading to provide this functionality and a small “main script” at
the end, with the usual if _ _name_ _ = = '_
_main_ _
' guard, to deliver this functionality as a
command-line script. For once, the small “main script” is not just for
demonstration and testing purposes but is actually quite useful on its
own. For example:
# python pinhole.py 80 webserver
forwards all incoming HTTP sessions on standard port 80 to host webserver;
# python pinhole.py 23 localhost 2323
redirects all incoming telnet
sessions on standard port 23 to port 2323 on this same host (since
localhost is the conventional hostname for “this
host” in all TCP/IP implementations).
Documentation for the standard library modules socket
and threading
in the Library
Reference and Python in a
Nutshell.
You need to tunnel SSL (Secure Socket Layer) communications through a proxy, but the Python Standard Library doesn’t support that functionality out of the box.
We can code a generic proxy, defaulting to SSL but, in fact,
good for all kinds of network protocols. Save the following code as
module file pytunnel.py somewhere
along your Python sys.path
:
import threading, socket, traceback, sys, base64, time def recv_all(the_socket, timeout=1): ''' receive all data available from the_socket, waiting no more than ``timeout'' seconds for new data to arrive; return data as string.''' # use non-blocking sockets the_socket.setblocking(0) total_data = [ ] begin = time.time( ) while True: ''' loop until timeout ''' if total_data and time.time( )-begin > timeout: break # if you got some data, then break after timeout seconds elif time.time( )-begin > timeout*2: break # if you got no data at all yet, wait a little longer try: data = the_socket.recv(4096) if data: total_data.append(data) begin = time.time( ) # reset start-of-wait time else: time.sleep(0.1) # give data some time to arrive except: pass return ''.join(total_data) class thread_it(threading.Thread): ''' thread instance to run a tunnel, or a tunnel-client ''' done = False def _ _init_ _(self, tid='', proxy='', server='', tunnel_client='', port=0, ip='', timeout=1): threading.Thread._ _init_ _(self) self.tid = tid self.proxy = proxy self.port = port self.server = server self.tunnel_client = tunnel_client self.ip = ip; self._port = port self.data = { } # store data here to get later self.timeout = timeout def run(self): try: if self.proxy and self.server: ''' running tunnel operation, so bridge server <-> proxy ''' new_socket = False while not thread_it.done: # loop until termination if not new_socket: new_socket, address = self.server.accept( ) else: self.proxy.sendall( recv_all(new_socket, timeout=self.timeout)) new_socket.sendall( recv_all(self.proxy, timeout=self.timeout)) elif self.tunnel_client: ''' running tunnel client, just mark down when it's done ''' self.tunnel_client(self.ip, self.port) thread_it.done = True # normal termination except Exception, error: print traceback.print_exc(sys.exc_info( )), error thread_it.done = True # orderly termination upon exception class build(object): ''' build a tunnel object, ready to run two threads as needed ''' def _ _init_ _(self, host='', port=443, proxy_host='', proxy_port=80, proxy_user='', proxy_pass='', proxy_type='', timeout=1): self._port=port; self.host=host; self._phost=proxy_host self._puser=proxy_user; self._pport=proxy_port; self._ppass=proxy_pass self._ptype=proxy_type; self.ip='127.0.0.1'; self.timeout=timeout self._server, self.server_port = self.get_server( ) def get_proxy(self): if not self._ptype: proxy = socket.socket(socket.AF_INET, socket.SOCK_STREAM) proxy.connect((self._phost, self._pport)) proxy_authorization = '' if self._puser: proxy_authorization = 'Proxy-authorization: Basic '+ base64.encodestring(self._puser+':'+self._ppass ).strip( )+' ' proxy_connect = 'CONNECT %s:%sHTTP/1.0 ' % ( self.host, self._port) user_agent = 'User-Agent: pytunnel ' proxy_pieces = proxy_connect+proxy_authorization+user_agent+' ' proxy.sendall(proxy_pieces+' ') response = recv_all(proxy, timeout=0.5) status = response.split(None, 1)[1] if int(status)/100 != 2: print 'error', response raise RuntimeError(status) return proxy def get_server(self): port = 2222 server = socket.socket(socket.AF_INET, socket.SOCK_STREAM) server.bind(('localhost', port)) server.listen(5) return server, port def run(self, func): Threads = [ ] Threads.append(thread_it(tid=0, proxy=self.get_proxy( ), server=self._server, timeout=self.timeout)) Threads.append(thread_it(tid=1, tunnel_client=func, ip=self.ip, port=self.server_port, timeout=0.5)) for Thread in Threads: Thread.start( ) for Thread in Threads: Thread.join( )
Here is how you would typically use this
pytunnel
module in a small example script that
tunnels an SSL connection through a proxy:
import pytunnel, httplib def tunnel_this(ip, port): conn = httplib.HTTPSConnection(ip, port=port) conn.putrequest('GET', '/') conn.endheaders( ) response = conn.getresponse( ) print response.read( ) tunnel = pytunnel.build(host='login.yahoo.com', proxy_host='h1', proxy_user='u', proxy_pass='p') tunnel.run(tunnel_this)
This example assumes you have a proxy server running on host
h1
, which is ready to accept basic
authentication for a proxy user named u
with a proxy password of p
. Since it’s
unlikely that this is, in fact, your specific setup, you’ll have to
tweak these parameters if you want to see an example of this recipe’s
code running. But you understand the general idea: you instantiate
class pytunnel.build
, with all appropriate parameters
passed with named-argument syntax, to build a tunnel object; then, you
call the tunnel object’s method run
, passing as its
argument your function that you want to be “tunneled” through the
proxy. That function, in turn, receives as its arguments an IP address
and a port number, and can connect to that address and port via SSL or
any protocol implying SSL/TLS (Transport Layer Security), such as
HTTPS.
Internally, the tunnel object instantiates two threads that are
instances of thread_it
, one to run the tunnel client
function, the other to perform the tunneling operation itself. The
tunneling operation, in turn, is nothing more than an endless loop
where all data available are received from one party and resent to the
other, and vice versa; function recv_all
deals with
the task of receiving all available data, while the socket method
send_all
does the sending. The
thread_it
instance which runs the tunneling
operation, therefore, does no more than an endless loop of just such
calls.
The code shown in this recipe is still being actively developed at the time of writing. For the latest version, see http://ftp.gnu.org/pub/savannah/files/pytunnel/pytunnel.py. Another alternative worth considering for tunneling and forwarding is Twisted’s simple proxy (http://www.twistedmatrix.com/), but I have not personally tried that one yet.
For SSL/TLS standards, http://www.ietf.org/html.charters/tls-charter.html;
documentation for the standard library modules socket
, threading
and time
in the Library
Reference and Python in a
Nutshell.
Credit: Nicola Paolucci, Mark Rowe, Andrew Notspecified
You use a Dynamic DNS Service which accepts the GnuDIP protocol (like yi.org), and need a command-line script to update your IP which is recorded with that service.
The Twisted framework has plenty of power for all kinds of network tasks, so we can use it to write a script to implement GnuDIP:
import md5, sys from twisted.internet import protocol, reactor from twisted.protocols import basic from twisted.python import usage def hashPassword(password, salt): ''' compute and return MD5 hash for given password and `salt'. ''' p1 = md5.md5(password).hexdigest( ) + '.' + salt.strip( ) return md5.md5(p1).hexdigest( ) class DIPProtocol(basic.LineReceiver): """ Implementation of GnuDIP protocol(TCP) as described at: http://gnudip2.sourceforge.net/gnudip-www/latest/gnudip/html/protocol.html """ delimiter = ' ' def connectionMade(self): ''' at connection, we start in state "expecting salt". ''' basic.LineReceiver.connectionMade(self) self.expectingSalt = True def lineReceived(self, line): ''' we received a full line, either "salt" or normal response ''' if self.expectingSalt: self.saltReceived(line) self.expectingSalt = False else: self.responseReceived(line) def saltReceived(self, salt): """ Override this 'abstract method' """ raise NotImplementedError def responseReceived(self, response): """ Override this 'abstract method' """ raise NotImplementedError class DIPUpdater(DIPProtocol): """ A simple class to update an IP, then disconnect. """ def saltReceived(self, salt): ''' having received `salt', login to the DIP server ''' password = self.factory.getPassword( ) username = self.factory.getUsername( ) domain = self.factory.getDomain( ) msg = '%s:%s:%s:2' % (username, hashPassword(password, salt), domain) self.sendLine(msg) def responseReceived(self, response): ''' response received: show errors if any, then disconnect. ''' code = response.split(':', 1)[0] if code == '0': pass # OK elif code == '1': print 'Authentication failed' else: print 'Unexpected response from server:', repr(response) self.transport.loseConnection( ) class DIPClientFactory(protocol.ClientFactory): """ Factory used to instantiate DIP protocol instances with correct username, password and domain. """ protocol = DIPUpdater # simply collect data for login and provide accessors to them def _ _init_ _(self, username, password, domain): self.u = username self.p = password self.d = domain def getUsername(self): return self.u def getPassword(self): return self.p def getDomain(self): return self.d def clientConnectionLost(self, connector, reason): ''' terminate script when we have disconnected ''' reactor.stop( ) def clientConnectionFailed(self, connector, reason): ''' show error message in case of network problems ''' print 'Connection failed. Reason:', reason class Options(usage.Options): ''' parse options from commandline or config script ''' optParameters = [['server', 's', 'gnudip2.yi.org', 'DIP Server'], ['port', 'p', 3495, 'DIP Server port'], ['username', 'u', 'durdn', 'Username'], ['password', 'w', None, 'Password'], ['domain', 'd', 'durdn.yi.org', 'Domain']] if _ _name_ _ == '_ _main_ _': # running as main script: first, get all the needed options config = Options( ) try: config.parseOptions( ) except usage.UsageError, errortext: print '%s: %s' % (sys.argv[0], errortext) print '%s: Try --help for usage details.' % (sys.argv[0]) sys.exit(1) server = config['server'] port = int(config['port']) password = config['password'] if not password: print 'Password not entered. Try --help for usage details.' sys.exit(1) # and now, start operations (via Twisted's ``reactor'') reactor.connectTCP(server, port, DIPClientFactory(config['username'], password, config['domain'])) reactor.run( )
I wanted to use a Dynamic DNS Service called yi.org, but I did not like the option of installing the suggested small client application to update my IP address on my OpenBSD box. So I resorted to writing the script shown in this recipe. I put it into my crontab to keep my domain always up-to-date with my dynamic IP address at home.
This little script is now at version 0.4, and its development history is quite instructive. I thought that even the first version. 0.1, which I got working in a few minutes, effectively demonstrated the power of the Twisted framework in developing network applications, so I posted that version on the ActiveState cookbook site. Lo and behold—Mark first, then Andrew, showered me with helpful suggestions, and I repeatedly updated the script in response to their advice. So it now demonstrates even better, not just the power of Twisted, but more generally the power of collaborative development in an open-source or free-software community.
To give just one example: originally, I had overridden buildProtocol
and passed the factory object
to the protocol object explicitly. The factory object, in the Twisted
framework architecture, is where shared state is kept (in this case,
the username, password, and domain), so I had to ensure the protocol
knew about the factory—I thought. It turns out that, exactly because
just about every protocol needs to know about its factory object,
Twisted takes care of it in its own default implementation of buildProtocol
, making the factory object
available as the factory
attribute
of every protocol object. So, my code, which duplicated Twisted’s
built-in functionality in this regard, was simply ripped out, and the
recipe’s code is simpler and better as a result.
Too often, software is presented as a finished and polished artifact, as if it sprang pristine and perfect like Athena from Zeus’ forehead. This gives entirely the wrong impression to budding software developers, making them feel inadequate because their code isn’t born perfect and fully developed. So, as a counterweight, I thought it important to present one little story about how software actually grows and develops!
One last detail: it’s tempting to place methods
updateIP
and removeIP
in the
DIPProtocol
class, to ease the writing of subclasses
such as DIPUpdater
. However, in my view, that would
be an over-generalization, overkill for such a simple, lightweight
recipe as Python and Twisted make this one. In practice we won’t need
all that many dynamic IP protocol subclasses, and if it turns out that
we’re wrong and we do, in fact, need them, hey, refactoring is clearly
not a hard task with such a fluid, dynamic
language and powerful frameworks to draw on. So, respect the prime
directive: “do the simplest thing that can possibly work.”
In a sense, the code in this recipe could be said to violate the prime directive, because it uses an elegant object-oriented architecture with an abstract base class, a concrete subclass to specialize it, and, in the factory class, accessor methods rather than simple attribute access for the login data (i.e., user, password, domain). All of these niceties are lifesavers in big programs, but they admittedly could be foregone for a program of only 120 lines (which would shrink a little further if it didn’t use all these niceties). However, adopting a uniform style of program architecture, even for small programs, eases the refactoring task in those not-so-rare cases where a small program grows into a big one. So, I have deliberately developed the habit of always coding in such an “elegant OO way”, and once the habit is acquired, I find that it enhances, rather than reduces, my productivity.
The GnuDIP protocol is specified at http://gnudip2.sourceforge.net/gnudip-www/latest/gnudip/html/protocol.html; Twisted is at http://www.twistedmatrix.com/.
Credit: Gian Mario Tagliaretti, J P Calderone
You want to connect to an IRC (Internet Relay Chat) server, join a channel, and store private messages into a file on your hard disk for future reading.
The Twisted framework has excellent support for many network protocols, including IRC, so we can perform this recipe’s task with a very simple script:
from twisted.internet import reactor, protocol from twisted.protocols import irc class LoggingIRCClient(irc.IRCClient): logfile = file('/tmp/msg.txt', 'a+') nickname = 'logging_bot' def signedOn(self): self.join('#test_py') def privmsg(self, user, channel, message): self.logfile.write(user.split('!')[0] + ' -> ' + message + ' ') self.logfile.flush( ) def main( ): f = protocol.ReconnectingClientFactory( ) f.protocol = LoggingIRCClient reactor.connectTCP('irc.freenode.net', 6667, f) reactor.run( ) if _ _name_ _ == '_ _main_ _': main( )
If, for some strange reason, you cannot use Twisted, then you can implement similar functionality from scratch based only on the Python Standard Library. Here’s a reasonable approach—nowhere as simple, solid, and robust as, and lacking the beneficial performance of, Twisted, but nevertheless sort of workable:
import socket SERVER = 'irc.freenode.net' PORT = 6667 NICKNAME = 'logging_bot' CHANNEL = '#test_py' IRC = socket.socket(socket.AF_INET, socket.SOCK_STREAM) def irc_conn( ): IRC.connect((SERVER, PORT)) def send_data(command): IRC.send(command + ' ') def join(channel): send_data("JOIN %s" % channel) def login(nickname, username='user', password=None, realname='Pythonist', hostname='Helena', servername='Server'): send_data("USER %s %s %s %s" % (username, hostname, servername, realname)) send_data("NICK %s" % nickname) irc_conn( ) login(NICKNAME) join(CHANNEL) filetxt = open('/tmp/msg.txt', 'a+') try: while True: buffer = IRC.recv(1024) msg = buffer.split( ) if msg[0] == "PING": # answer PING with PONG, as RFC 1459 specifies send_data("PONG %s" % msg[1]) if msg [1] == 'PRIVMSG' and msg[2] == NICKNAME: nick_name = msg[0][:msg[0].find("!")] message = ' '.join(msg[3:]) filetxt.write(nick_name.lstrip(':') + ' -> ' + message.lstrip(':') + ' ') filetxt.flush( ) finally: filetxt.close( )
For this roll-our-own reimplementation, we do need some
understanding of the protocol’s RFC, such as the need to answer a
server’s PING with a proper PONG to confirm that our connection is
alive. In any case, since the code has already grown to over twice as
much as Twisted requires, we’ve omitted niceties (which are very
important for reliable unattended operation) such as automatic
reconnection attempts when the connection drops, which Twisted gives
us effortlessly via its protocol.ReconnectingClientFactory
.
Documentation for the standard library module socket
in the Library
Reference and Python in a Nutshell;
twisted
is at http://www.twistedmatrix.com.
Credit: John Nielsen
You need to access an LDAP (Lightweight Directory Access Protocol) server from your Python programs.
The simplest solution is offered by the freely downloadable
third-party extension ldap
(http://python-ldap.sourceforge.net). This
script shows a few LDAP operations with ldap
:
try: path = 'cn=people,ou=office,o=company' l = ldap.open('hostname') # set which protocol to use, if you do not like the default l.protocol_version = ldap.VERSION2 l.simple_bind('cn=root,ou=office,o=company','password') # search for surnames beginning with a # available options for how deep a search you want: # LDAP_SCOPE_BASE, LDAP_SCOPE_ONELEVEL,LDAP_SCOPE_SUBTREE, a = l.search_s(path, ldap.SCOPE_SUBTREE, 'sn='+'a*') # delete fred l.delete_s('cn=fred,'+path) # add barney # note: objectclass depends on the LDAP server user_info = {'uid':'barney123', 'givenname':'Barney', 'cn':'barney123', 'sn':'Smith', 'telephonenumber':'123-4567', 'facsimiletelephonenumber':'987-6543', 'objectclass':('Remote-Address','person', 'Top'), 'physicaldeliveryofficename':'Services', 'mail':'[email protected]', 'title':'programmer', } id = 'cn=barney,'+path l.add_s(id, user_info.items( )) except ldap.LDAPError, error: print 'problem with ldap:', error
The ldap
module wraps
the open source Openldap C API. However, with ldap
, your Python program can talk to
various versions of LDAP servers, as long as they’re standards
compliant, not just to Openldap servers.
The recipe shows a script with a few example uses of the
ldap
module. For simplicity, all
the functions the recipe calls from the library are the '_s
' versions (e.g., search_s
): this means the functions are
synchronous—that is, they wait for a response or an error code and
don’t return control to your program until either an error or a
response appears from the server. Asynchronous programming is less
elementary than synchronous, although it can often offer far better
performance and scalability.
LDAP is widely used to keep and coordinate network-accessible
information, particularly in large and geographically distributed
organizations. Essentially, LDAP lets you organize information, search
for it, create new items, and delete existing items. The ldap
module lets your Python program perform
the search, creation, and deletion functions.
http://python-ldap.sourceforge.net/docs.shtml
for all the documentation about the ldap
module and other relevant
pointers.