Customizing requests with urllib

To make use of the functionality that headers provide, we add headers to a request before sending it. To do this, we need to follow these steps:

  1. Create a Request object.
  2. Add headers to the Request object.
  3. Use urlopen() to send the Request object.

We're going to learn how to customize a request to retrieve a Netherlands version of the Debian home page. We will use the Accept-Language header, which tells the server our preferred language for the resource it returns.

First, we create a Request object:

>>> from urllib.request import Request,urlopen
>>> req = Request('http://www.debian.org')

Next, we add the header:

>>> req.add_header('Accept-Language', 'nl')

The add_header() method takes the name of the header and the contents of the header as arguments. The Accept-Language header takes two-letter ISO 639-1 language codes. In this example, the code for Netherlands is nl.

Lastly, we submit the customized request with urlopen():

>>> response = urlopen(req)

We can check if the response is in the Dutch language by printing out the first few lines:

>>> response.readlines()[:5]

In this screenshot, we can see that the language changed with the Accept-language header:

The Accept-Language header has informed the server about our preferred language for the response's content. To view the headers present in a request, do the following:

>>> req = Request('http://www.debian.org')
>>> req.add_header('Accept-Language', 'nl')
>>> req.header_items()
[('Host', 'www.debian.org'), ('User-agent', 'Python-urllib/3.6'), ('Accept-language', 'nl')]

Let's see how to add our own headers using the User-agent header as an example. The User-agent is a header used to identify the browser and operating system that we are using to connect to that URL. If we want to identify ourselves as using a Firefox browser, we could change the user agent.

To change the user agent, we have two alternatives. The first is using a headers dictionary parameter in the Request method. The second solution consists of using the add_header() method for adding headers at the same time that we create the Request object, as showing in the following example.

You can find the following code in the add_headers_user_agent.py file:

#!/usr/bin/env python3
from urllib.request import Request
USER_AGENT = 'Mozilla/5.0 (Windows NT 5.1; rv:20.0) Gecko/20100101 Firefox/20.0'
URL = 'http://www.debian.org'

def add_headers_user_agent():
headers = {'Accept-Language': 'nl','User-agent': USER_AGENT}
request = Request(URL,headers=headers)
#request.add_header('Accept-Language', 'nl')
#request.add_header('User-agent', USER_AGENT)
print ("Request headers:")
for key,value in request.header_items():
print ("%s: %s" %(key, value))

if __name__ == '__main__':
add_headers_user_agent()

In this screenshot, we can see the request headers sent for the previous script:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset