Application for checking websites

Now, we will build an application that checks the running time of websites. The purpose of this application is to notify when a site or domain is not available. The application visits a list of URLs and checks whether these sites are operational. If, when making an HTTP request, the returned status is in the range of 400-500, this means that the site is not available and it would be a good idea to notify the owner.

We need to adopt a concurrent approach to solve this problem because, since we have more addresses to check in the list of websites, nothing guarantees that each site is reviewed every five minutes or less. 

In the following example, we are going to use the concurrent.futures package for processing domains in a concurrent way to check whether each website is available. The requests module will help us to obtain the status of each domain.

You can find the following code in the demo_concurrent_futures.py file:

#!/usr/bin/python3
import concurrent.futures
import requests

URLS = ['http://www.foxnews.com/','http://www.cnn.com/',
'http://www.bbc.co.uk/',
'http://some-made-up-domain.com/']

# Retrieve a single page with requests module
def load_requests(domain):
with requests.get(domain, timeout=60) as connection:
return connection.text

In the previous code block, we define our URL list for checking the website and the load_requests() method that accepts as parameters a domain and tries to establish a connection with the requests package.

In the following code block, we define our executor that we use for checking the state for each domain in a concurrent way:

with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
future_executor = {executor.submit(load_requests, domain): domain for domain in DOMAINS}
for domain_future in concurrent.futures.as_completed(future_executor):
domain = future_executor[domain_future]
try:
data = domain_future.result()
print('%r page is %d bytes' % (domain, len(data)))
except Exception as exception:
print('%r generated an exception: %s' % (domain, exception))

The following is the output of the previous script, where we can see information about the sizes for download pages for domains that are available:

'http://www.foxnews.com/' page is 221581 bytes
'http://www.bbc.co.uk/' page is 303120 bytes
'http://www.cnn.com/' page is 1899465 bytes
'http://some-made-up-domain.com/' generated an exception: HTTPConnectionPool(host='ww1.some-made-up-domain.com', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x00000295B3E0BEF0>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond'))

The executor is the one who manages threads (ThreadPoolExecutor) or processes (ProcessPoolExecutor). Also, when we define the ThreadPoolExecutor constructor for getting the executor object, you can put the number of workers that you want to use, depending on the number of cores in our CPU.

In the previous example, we used the as_completed method to obtain the results as they were obtained. This method returns an iterator over the future instances that are given by the future_executor variable, which yields futures as they finish. You can check the full documentation and other examples about this function at https://docs.python.org/dev/library/concurrent.futures.html#threadpoolexecutor-example.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset