Here's how the script starts:
# scrape.py
import argparse
import base64
import json
import os
from bs4 import BeautifulSoup
import requests
Going through them from the top, you can see that we'll need to parse the arguments, which we'll feed to the script itself (argparse). We will need the base64 library to save the images within a JSON file (json), and we'll need to open files for writing (os). Finally, we'll need BeautifulSoup for scraping the web page easily, and requests to fetch its content. I assume you're familiar with requests as we have used it in previous chapters.
Of all these imports, only the last two don't belong to the Python standard library, so make sure you have them installed:
$ pip freeze | egrep -i "soup|requests"
beautifulsoup4==4.6.0
requests==2.18.4
Of course, the version numbers might be different for you. If they're not installed, use this command to do so:
$ pip install beautifulsoup4==4.6.0 requests==2.18.4
At this point, the only thing that I reckon might confuse you is the base64/json couple, so allow me to spend a few words on that.
As we saw in the previous chapter, JSON is one of the most popular formats for data exchange between applications. It's also widely used for other purposes too, for example, to save data in a file. In our script, we're going to offer the user the ability to save images as image files, or as a JSON single file. Within the JSON, we'll put a dictionary with keys as the image names and values as their content. The only issue is that saving images in the binary format is tricky, and this is where the base64 library comes to the rescue.
The base64 library is actually quite useful. For example, every time you send an email with an image attached to it, the image gets encoded with base64 before the email is sent. On the recipient side, images are automatically decoded into their original binary format so that the email client can display them.