Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Resuming the HTTP Download of a File

Credit: Chris Moffitt

Problem

You need to resume an HTTP download of a file that has been partially transferred.

Solution

Large downloads are sometimes interrupted. However, a good HTTP server that supports the Range header lets you resume the download from where it was interrupted. The standard Python module urllib lets you access this functionality almost seamlessly. You need to add only the needed header and intercept the error code the server sends to confirm that it will respond with a partial file:

import urllib, os

class myURLOpener(urllib.FancyURLopener):
    """ Subclass to override error 206 (partial file being sent); okay for us """
    def http_error_206(self, url, fp, errcode, errmsg, headers, data=None):
        pass    # Ignore the expected "non-error" code

def getrest(dlFile, fromUrl, verbose=0):
    loop = 1
    existSize = 0
    myUrlclass = myURLOpener(  )
    if os.path.exists(dlFile):
        outputFile = open(dlFile,"ab")
        existSize = os.path.getsize(dlFile)
        # If the file exists, then download only the remainder
        myUrlclass.addheader("Range","bytes=%s-" % (existSize))
    else:
        outputFile = open(dlFile,"wb")

    webPage = myUrlclass.open(fromUrl)
    if verbose:
        for k, v in webPage.headers.items(  ):
            print k, "=", v

    # If we already have the whole file, there is no need to download it again
    numBytes = 0
    webSize = int(webPage.headers['Content-Length'])
    if webSize == existSize:
        if verbose: print "File (%s) was already downloaded from URL (%s)"%(
            dlFile, fromUrl)
    else:
        if verbose: print "Downloading %d more bytes" % (webSize-existSize)
        while 1:
            data = webPage.read(8192)
            if not data:
                break
            outputFile.write(data)
            numBytes = numBytes + len(data)

    webPage.close(  )
    outputFile.close(  )

    if verbose:
        print "downloaded", numBytes, "bytes from", webPage.url
    return numbytes

Discussion

The HTTP Range header lets the web server know that you want only a certain range of data to be downloaded, and this recipe takes advantage of this header. Of course, the server needs to support the Range header, but since the header is part of the HTTP 1.1 specification, it’s widely supported. This recipe has been tested with Apache 1.3 as the server, but I expect no problems with other reasonably modern servers.

The recipe lets urllib.FancyURLopener to do all the hard work of adding a new header, as well as the normal handshaking. I had to subclass it to make it known that the error 206 is not really an error in this case—so you can proceed normally. I also do some extra checks to quit the download if I’ve already downloaded the whole file.

Check out the HTTP 1.1 RFC (2616) to learn more about what all of the headers mean. You may find a header that is very useful, and Python’s urllib lets you send any header you want. This recipe should probably do a check to make sure that the web server accepts Range, but this is pretty simple to do.

Table of Contents for
Resuming the HTTP Download of a File

Resuming the HTTP Download of a File

Problem

Solution

Discussion

See Also

Table of Contents for Resuming the HTTP Download of a File

Create new playlist

Sign In

Sign Up

Resuming the HTTP Download of a File

Problem

Solution

Discussion

See Also

Table of Contents for
Resuming the HTTP Download of a File