Fetching the web page

Let's start with the code to fetch the page and images:

config = {}

def fetch_url():
    url = _url.get()
    config['images'] = []
    _images.set(())  # initialised as an empty tuple
    try:
        page = requests.get(url)
    except requests.RequestException as err:
        _sb(str(err))
    else:
        soup = BeautifulSoup(page.content, 'html.parser')
        images = fetch_images(soup, url)
        if images:
            _images.set(tuple(img['name'] for img in images))
            _sb('Images found: {}'.format(len(images)))
        else:
            _sb('No images found')
        config['images'] = images

def fetch_images(soup, base_url):
    images = []
    for img in soup.findAll('img'):
        src = img.get('src')
        img_url = f'{base_url}/{src}'
        name = img_url.split('/')[-1]
        images.append(dict(name=name, url=img_url))
    return images

First of all, let me explain that config dictionary. We need some way of passing data between the GUI application and the business logic. Now, instead of polluting the global namespace with many different variables, my personal preference is to have a single dictionary that holds all the objects we need to pass back and forth, so that the global namespace isn't clogged up with all those names, and we have a single, clean, easy way of knowing where all the objects that are needed by our application are.

In this simple example, we'll just populate the config dictionary with the images we fetch from the page, but I wanted to show you the technique so that you have at least one example. This technique comes from my experience with JavaScript. When you code a web page, you often import several different libraries. If each of these cluttered the global namespace with all sorts of variables, there might be issues in making everything work, because of name clashes and variable overriding.

So, it's much better to leave the global namespace as clean as we can. In this case, I find that using one config variable is more than acceptable.

The fetch_url function is quite similar to what we did in the script. First, we get the url value by calling _url.get(). Remember that the _url object is a StringVar instance that is tied to the _url_entry object, which is an Entry. The text field you see on the GUI is the Entry, but the text behind the scenes is the value of the StringVar object.

By calling get() on _url, we get the value of the text, which is displayed in _url_entry.

The next step is to prepare config['images'] to be an empty list, and to empty the _images variable, which is tied to _img_listbox. This, of course, has the effect of cleaning up all the items in _img_listbox.

After this preparation step, we can try to fetch the page, using the same try/except logic we adopted in the script at the beginning of the chapter. The one difference is the action we take if things go wrong. We call _sb(str(err)). _sb is a helper function whose code we'll see shortly. Basically, it sets the text in the status bar for us. Not a good name, right? I had to explain its behavior to you–food for thought.

If we can fetch the page, then we create the soup instance, and fetch the images from it. The logic of fetch_images is exactly the same as the one explained before, so I won't repeat myself here.

If we have images, using a quick tuple comprehension (which is actually a generator expression fed to a tuple constructor) we feed the _images as StringVar and this has the effect of populating our _img_listbox with all the image names. Finally, we update the status bar.

If there were no images, we still update the status bar, and at the end of the function, regardless of how many images were found, we update config['images'] to hold the images list. In this way, we'll be able to access the images from other functions by inspecting config['images'] without having to pass that list around.

Table of Contents for Fetching the web page

Create new playlist

Sign In

Sign Up

Table of Contents for
Fetching the web page