Sometimes servers move their content around. They also make some content obsolete and put up new stuff in a different location. Sometimes they'd like us to use the more secure HTTPS protocol instead of HTTP. In all these cases, they may get traffic that asks for the old URLs, and in all these cases they'd probably prefer to be able to automatically send visitors to the new ones.
The 300 range of HTTP status codes is designed for this purpose. These codes indicate to the client that further action is required on their part to complete the request. The most commonly encountered action is to retry the request at a different URL. This is called a redirect.
We'll learn how this works when using urllib
. Let's make a request:
>>> req = Request('http://www.gmail.com') >>> response = urlopen(req)
Simple enough, but now, look at the URL of the response:
>>> response.url 'https://accounts.google.com/ServiceLogin?service=mail&passive=true&r m=false...'
This is not the URL that we requested! If we open this new URL in a browser, then we'll see that it's actually the Google login page (you may need to clear your browser cookies to see this if you already have a cached Google login session). Google redirected us from http://www.gmail.com to its login page, and urllib
automatically followed the redirect. Moreover, we may have been redirected more than once. Look at the redirect_dict
attribute of our request object:
>>> req.redirect_dict {'https://accounts.google.com/ServiceLogin?service=...': 1, 'https://mail.google.com/mail/': 1}
The urllib
package adds every URL that we were redirected through to this dict
. We can see that we have actually been redirected twice, first to https://mail.google.com, and second to the login page.
When we send our first request, the server sends a response with a redirect status code, one of 301, 302, 303, or 307. All of these indicate a redirect. This response includes a Location
header, which contains the new URL. The urllib
package will submit a new request to that URL, and in the aforementioned case, it will receive yet another redirect, which will lead it to the Google login page.
Since urllib
follows redirects for us, they generally don't affect us, but it's worth knowing that a response urllib
returns may be for a URL different from what we had requested. Also, if we hit too many redirects for a single request (more than 10 for urllib
), then urllib
will give up and raise an urllib.error.HTTPError
exception.