So that's it for the urllib
package. As you can see, access to the standard library is more than adequate for most HTTP tasks. We haven't touched upon all of its capabilities. There are numerous handler classes which we haven't discussed, plus the opener interface is extensible.
However, the API isn't the most elegant, and there have been several attempts made to improve it. One of these is the very popular third-party library called Requests. It's available as the requests
package on PyPi. It can either be installed through Pip or be downloaded from http://docs.python-requests.org, which hosts the documentation.
The Requests
library automates and simplifies many of the tasks that we've been looking at. The quickest way of illustrating this is by trying some examples.
The commands for retrieving a URL with Requests
are similar to retrieving a URL with the urllib
package, as shown here:
>>> import requests >>> response = requests.get('http://www.debian.org')
And we can look at properties of the response object. Try:
>>> response.status_code 200 >>> response.reason 'OK' >>> response.url 'http://www.debian.org/' >>> response.headers['content-type'] 'text/html'
Note that the header name in the preceding command is in lowercase. The keys in the headers
attribute of Requests
response objects are case insensitive.
There are some convenience attributes that have been added to the response object:
>>> response.ok True
The ok
attribute indicates whether the request was successful. That is, the request contained a status code in the 200 range. Also:
>>> response.is_redirect False
The is_redirect
attribute indicates whether the request was redirected. We can also access the request properties through the response object:
>>> response.request.headers {'User-Agent': 'python-requests/2.3.0 CPython/3.4.1 Linux/3.2.0-4- amd64', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*'}
Notice that Requests
is automatically handling compression for us. It's including gzip
and deflate
in an Accept-Encoding
header. If we look at the Content-Encoding
response, then we will see that the response was in fact gzip
compressed, and Requests
transparently decompressed it for us:
>>> response.headers['content-encoding'] 'gzip'
We can look at the response content in many more ways. To get the same bytes object as we got from an HTTPResponse
object, perform the following:
>>> response.content b'<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html lang="en">...
But Requests
also performs automatic decoding for us. To get the decoded content, do this:
>>> response.text '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html lang="en"> <head> ...
Notice that this is now str
rather than bytes
. The Requests
library uses values in the headers for choosing a character set and decoding the content to Unicode for us. If it can't get a character set from the headers, then it uses the chardet
library (http://pypi.python.org/pypi/chardet) to make an estimate from the content itself. We can see what encoding Requests
has chosen here:
>>> response.encoding 'ISO-8859-1'
We can even ask it to change the encoding that it has used:
>>> response.encoding = 'utf-8'
After changing the encoding, subsequent references to the text
attribute for this response will return the content decoded by using the new encoding setting.
The Requests
library automatically handles cookies. Give the following a try:
>>> response = requests.get('http://www.github.com') >>> print(response.cookies) <<class 'requests.cookies.RequestsCookieJar'> [<Cookie logged_in=no for .github.com/>, <Cookie _gh_sess=eyJzZxNz... for ..github.com/>]>
The Requests
library also has a Session
class, which allows the reuse of cookies, and this is similar to using the http
module's CookieJar
and the urllib
module's HTTPCookieHandler
objects. Do the following to reuse the cookies in subsequent requests:
>>> s = requests.Session() >>> s.get('http://www.google.com') >>> response = s.get('http://google.com/preferences')
The Session
object has the same interface as the requests
module, so we use its get()
method in the same way as we use the requests.get()method
. Now, any cookies encountered are stored in the Session
object, and they will be sent with corresponding requests when we use the get()
method in the future.
Redirects are also automatically followed, in the same way as when using urllib
, and any redirected requests are captured in the history
attribute.
The different HTTP methods are easily accessible, they have their own functions:
>>> response = requests.head('http://www.google.com') >>> response.status_code 200 >>> response.text ''
Custom headers are added to to requests in a similar way as they are when using urllib
:
>>> headers = {'User-Agent': 'Mozilla/5.0 Firefox 24'} >>> response = requests.get('http://www.debian.org', headers=headers)
Making requests with query strings is a straightforward process:
>>> params = {':action': 'search', 'term': 'Are you quite sure this is a cheese shop?'} >>> response = requests.get('http://pypi.python.org/pypi', params=params) >>> response.url 'https://pypi.python.org/pypi?%3Aaction=search&term=Are+you+quite+sur e+this+is+a+cheese+shop%3F'
The Requests
library takes care of all the encoding and formatting for us.
Posting is similarly simplified, although we use the data
keyword argument here:
>>> data = {'P', 'Python'} >>> response = requests.post('http://search.debian.org/cgi- bin/omega', data=data)
Errors in Requests
are handled slightly differently from how they are handled with urllib
. Let's work through some error conditions and see how it works. Generate a 404 error by doing the following:
>>> response = requests.get('http://www.google.com/notawebpage') >>> response.status_code 404
In this situation, urllib
would have raised an exception, but notice that Requests
doesn't. The Requests
library can check the status code and raise a corresponding exception, but we have to ask it to do so:
>>> response.raise_for_status() ... requests.exceptions.HTTPError: 404 Client Error
Now, try it on a successful request:
>>> r = requests.get('http://www.google.com') >>> r.status_code 200 >>> r.raise_for_status() None
It doesn't do anything, which in most situations would let our program exit a try/except
block and then continue as we would want it to.
What happens if we get an error that is lower in the protocol stack? Try the following:
>>> r = requests.get('http://192.0.2.1') ... requests.exceptions.ConnectionError: HTTPConnectionPool(...
We have made a request for a host that doesn't exist and once it has timed out, we get a ConnectionError
exception.
The Requests
library simply reduces the workload that is involved in using HTTP in Python as compared to urllib
. Unless you have a requirement for using urllib
, I would always recommend using Requests
for your projects.