Although this is not strictly data visualization in usual terms, the ability to generate images using Python comes in handy in many cases, and this is one of them.
In this recipe, we will be covering the generation of random images to tell humans and computers apart—CAPTCHA image.
CAPTCHA stands for Completely Automated Public Turing test to tell Computers and Humans Apart, and is trademarked by Carnegie Mellon University. This test is used to challenge computer programs (usually referred to as bots) that automatically fill various web forms that are primarily targeted at humans and that should not be automated. Usual examples are sign-up forms, login forms, surveys, and similar.
CAPTCHA itself can take various forms, but the most common form consists of a challenge where a human should read an image with distorted characters and numbers and type in the result in the related response field.
In this recipe, you will learn how to harness Python's Imaging Library to generate images, render lines and points, and also render text.
We will show you what is involved in creating a personal and simple CAPTCHA generator by performing the following steps:
The following code shows how to create a personal and simple CAPTCHA generator:
from PIL import Image, ImageDraw, ImageFont import random import string class SimpleCaptchaException(Exception): pass class SimpleCaptcha(object): def __init__(self, length=5, size=(200, 100), fontsize=36, random_text=None, random_bgcolor=None): self.size = size self.text = "CAPTCHA" self.fontsize = fontsize self.bgcolor = 255 self.length = length self.image = None # current captcha image if random_text: self.text = self._random_text() if not self.text: raise SimpleCaptchaException("Field text must not be empty.") if not self.size: raise SimpleCaptchaException("Size must not be empty.") if not self.fontsize: raise SimpleCaptchaException("Font size must be defined.") if random_bgcolor: self.bgcolor = self._random_color() def _center_coords(self, draw, font): width, height = draw.textsize(self.text, font) xy = (self.size[0] - width) / 2., (self.size[1] - height) / 2. return xy def _add_noise_dots(self, draw): size = self.image.size for _ in range(int(size[0] * size[1] * 0.1)): draw.point((random.randint(0, size[0]), random.randint(0, size[1])), fill="white") return draw def _add_noise_lines(self, draw): size = self.image.size for _ in range(8): width = random.randint(1, 2) start = (0, random.randint(0, size[1] - 1)) end = (size[0], random.randint(0,size[1]-1)) draw.line([start, end], fill="white", width=width) for _ in range(8): start = (-50, -50) end = (size[0] + 10, random.randint(0, size[1]+10)) draw.arc(start + end, 0, 360, fill="white") return draw def get_captcha(self, size=None, text=None, bgcolor=None): if text is not None: self.text = text if size is not None: self.size = size if bgcolor is not None: self.bgcolor = bgcolor self.image = Image.new('RGB', self.size, self.bgcolor) # Note that the font file must be present # or point to your OS's system font # Ex. on Mac the path should be '/Library/Fonts/Tahoma.ttf' font = ImageFont.truetype('fonts/Vera.ttf', self.fontsize) draw = ImageDraw.Draw(self.image) xy = self._center_coords(draw, font) draw.text(xy=xy, text=self.text, font=font) # Add some dot noise draw = self._add_noise_dots(draw) # Add some random lines draw = self._add_noise_lines(draw) self.image.show() return self.image, self.text def _random_text(self): letters = string.ascii_lowercase + string.ascii_uppercase random_text = "" for _ in range(self.length): random_text += random.choice(letters) return random_text def _random_color(self): r = random.randint(0, 255) g = random.randint(0, 255) b = random.randint(0, 255) return (r, g, b) if __name__ == "__main__": sc = SimpleCaptcha(length=7, fontsize=36, random_text=True, random_bgcolor=True) sc.get_captcha()
This example shows a process for using Python's imaging library to generate predefined images, to create a simple, yet effective, CAPTCHA generator.
We wrapped the functionality into one class SimpleCaptcha
, because it gives us a safe space for future development. We also created a custom SimpleCaptchaException
to accommodate future exception hierarchies.
Start reading from the main section. At the end of the code listing, we instantiate class giving settings of our future image as arguments to the constructor. Following that, we call the get_captcha
method on the sc object. For this recipe's purposes, get_captcha
shows the image object as a result, but we also return the image object to the potential caller of this method so it could make use of the result. The usage can vary; the caller could either save the image on the file, or if this was a web application, return the image stream and written challenge to the client requesting this CAPTCHA.
The important thing to note is that in order to finish the challenge-response process of the CAPTCHA test, we must return the CAPTCHA string generated on the image as text so that the caller can compare the user's response with the expected values.
The get_captcha
method first verifies the input arguments, in order to override the class defaults if the user provides custom values. After that, a new image object is instantiated by Image.new
. This object is saved in self.image
, where we use it to draw and write text. Having written the text to the image, we add the noise of randomly placed points and lines, as well as some arc segments.
These tasks are carried out by the _add_noise_points
and _add_noise_lines
methods. The first one loops a few times and adds a point to a random location on the image, not too close to the edges of the image, and the latter one draws lines from the left-hand side of the image to the right-hand side of the image.
We constructed this class using some assumptions about its use. We assumed that the user will just want to accept our default settings (that is, a random seven characters on a random background color) and receive the result from it. That is the reasoning behind placing helper functions in the constructor to set random text and random background color. If the most frequent and effective usage is to always override configuration, then we want to remove these operations from the constructor and place them in separate calls.
For example, maybe a user wants to always use English words as the CAPTCHA challenge. If this is the case, we want to be able to just call a method to provide us with results like that. This method could be get_english_captcha
and with the random logic of this constructor, we would then construct that method to pick random words from the provided English dictionary. On a Unix system, there is a common English dictionary inside /usr/share/dict/words
that we could use for this:
def get_english_captcha(self): words = '/usr/share/dict/words' with open(words, 'r') as wf: words = wf.readlines() aword = random.choice(words) aword = aword.strip() # remove newline and spaces return self.get_captcha(text=aword)
Overall, the example of the CAPTCHA generation is not production quality and should not be used without adding more protection and randomness, such as letter rotation.
If you need to protect your web forms from bots, there are already third-party Python modules and libraries that you could use. There are even specialized modules built for the existing web frameworks.
There are event web services such as reCAPTCHA (http://www.google.com/recaptcha) with an already proven Python module recaptcha-client (https://pypi.python.org/pypi/recaptcha-client) that you can sign up and use. It does not require any imaging libraries because the image is pulled directly from the reCAPTCHA web service, but it has other dependencies such as pycrypto. Using this web service and library, you are also helping books scanned using Optical Character Recognition (OCR) from the Google Books project or old editions of The New York Times. Read more on the reCAPTCHA website.