Chapter 2. Handling Files, Cameras, and GUIs

This chapter introduces OpenCV's I/O functionality. We also discuss a project concept and the beginnings of an object-oriented design for this project, which we will flesh out in subsequent chapters.

By starting with a look at I/O capabilities and design patterns, we are building our project in the same way we would make a sandwich: from the outside in. Bread slices and spread or endpoints and glue, come before fillings or algorithms. We choose this approach because computer vision is extroverted—it contemplates the real world outside our computer—and we want to apply all our subsequent, algorithmic work to the real world through a common interface.

Note

All the finished code for this chapter can be downloaded from my website: http://nummist.com/opencv/3923_02.zip.

Basic I/O scripts

All CV applications need to get images as input. Most also need to produce images as output. An interactive CV application might require a camera as an input source and a window as a output destination. However, other possible sources and destinations include image files, video files, and raw bytes. For example, raw bytes might be received/sent via a network connection or might be generated by an algorithm if we are incorporating procedural graphics into our application. Let's look at each of these possibilities.

Reading/Writing an image file

OpenCV provides the imread() and imwrite() functions that support various file formats for still images. The supported formats vary by system but should always include the BMP format. Typically, PNG, JPEG, and TIFF should be among the supported formats too. Images can be loaded from one file format and saved to another. For example, let's convert an image from PNG to JPEG:

import cv2

image = cv2.imread('MyPic.png')
cv2.imwrite('MyPic.jpg', image)

Note

Most of the OpenCV functionality that we use is in the cv2 module. You might come across other OpenCV guides that instead rely on the cv or cv2.cv modules, which are legacy versions. We do use cv2.cv for certain constants that are not yet redefined in cv2.

By default, imread() returns an image in BGR color format, even if the file uses a grayscale format. BGR (blue-green-red) represents the same color space as RGB (red-green-blue) but the byte order is reversed.

Optionally, we may specify the mode of imread() to be CV_LOAD_IMAGE_COLOR (BGR), CV_LOAD_IMAGE_GRAYSCALE (grayscale), or CV_LOAD_IMAGE_UNCHANGED (either BGR or grayscale, depending on the file's color space). For example, let's load a PNG as a grayscale image (losing any color information in the process) and, then, save it as a grayscale PNG image:

import cv2

grayImage = cv2.imread('MyPic.png', cv2.CV_LOAD_IMAGE_GRAYSCALE)
cv2.imwrite('MyPicGray.png', grayImage)

Regardless of the mode, imread() discards any alpha channel (transparency). The imwrite() function requires an image to be in BGR or grayscale format with a number of bits per channel that the output format can support. For example, bmp requires 8 bits per channel while PNG allows either 8 or 16 bits per channel.

Converting between an image and raw bytes

Conceptually, a byte is an integer ranging from 0 to 255. Throughout real-time graphics applications today, a pixel is typically represented by one byte per channel, though other representations are also possible.

An OpenCV image is a 2D or 3D array of type numpy.array. An 8-bit grayscale image is a 2D array containing byte values. A 24-bit BGR image is a 3D array, also containing byte values. We may access these values by using an expression like image[0, 0] or image[0, 0, 0]. The first index is the pixel's y coordinate, or row, 0 being the top. The second index is the pixel's x coordinate, or column, 0 being the leftmost. The third index (if applicable) represents a color channel.

For example, in an 8-bit grayscale image with a white pixel in the upper-left corner, image[0, 0] is 255. For a 24-bit BGR image with a blue pixel in the upper-left corner, image[0, 0] is [255, 0, 0].

Note

As an alternative to using an expression like image[0, 0] or image[0, 0] = 128, we may use an expression like image.item((0, 0)) or image.setitem((0, 0), 128). The latter expressions are more efficient for single-pixel operations. However, as we will see in subsequent chapters, we usually want to perform operations on large slices of an image rather than single pixels.

Provided that an image has 8 bits per channel, we can cast it to a standard Python bytearray, which is one-dimensional:

byteArray = bytearray(image)

Conversely, provided that bytearray contains bytes in an appropriate order, we can cast and then reshape it to get a numpy.array type that is an image:

grayImage = numpy.array(grayByteArray).reshape(height, width)
bgrImage = numpy.array(bgrByteArray).reshape(height, width, 3)

As a more complete example, let's convert bytearray containing random bytes to a grayscale image and a BGR image:

import cv2
import numpy
import os

# Make an array of 120,000 random bytes.
randomByteArray = bytearray(os.urandom(120000))
flatNumpyArray = numpy.array(randomByteArray)

# Convert the array to make a 400x300 grayscale image.
grayImage = flatNumpyArray.reshape(300, 400)
cv2.imwrite('RandomGray.png', grayImage)

# Convert the array to make a 400x100 color image.
bgrImage = flatNumpyArray.reshape(100, 400, 3)
cv2.imwrite('RandomColor.png', bgrImage)

After running this script, we should have a pair of randomly generated images, RandomGray.png and RandomColor.png, in the script's directory.

Note

Here, we use Python's standard os.urandom() function to generate random raw bytes, which we then convert to a Numpy array. Note that it is also possible to generate a random Numpy array directly (and more efficiently) using a statement such as numpy.random.randint(0, 256, 120000).reshape(300, 400). The only reason we are using os.urandom() is to help demonstrate conversion from raw bytes.

Reading/Writing a video file

OpenCV provides the VideoCapture and VideoWriter classes that support various video file formats. The supported formats vary by system but should always include AVI. Via its read() method, a VideoCapture class may be polled for new frames until reaching the end of its video file. Each frame is an image in BGR format. Conversely, an image may be passed to the write() method of the VideoWriter class, which appends the image to the file in VideoWriter. Let's look at an example that reads frames from one AVI file and writes them to another AVI file with YUV encoding:

import cv2

videoCapture = cv2.VideoCapture('MyInputVid.avi')
fps = videoCapture.get(cv2.cv.CV_CAP_PROP_FPS)
size = (int(videoCapture.get(cv2.cv.CV_CAP_PROP_FRAME_WIDTH)),
        int(videoCapture.get(cv2.cv.CV_CAP_PROP_FRAME_HEIGHT)))
videoWriter = cv2.VideoWriter(
    'MyOutputVid.avi', cv2.cv.CV_FOURCC('I','4','2','0'), fps, size)

success, frame = videoCapture.read()
while success: # Loop until there are no more frames.
    videoWriter.write(frame)
    success, frame = videoCapture.read()

The arguments to VideoWriter class' constructor deserve special attention. The video's filename must be specified. Any preexisting file with that name is overwritten. A video codec must also be specified. The available codecs may vary from system to system. Options include:

  • cv2.cv.CV_FOURCC('I','4','2','0'): This is an uncompressed YUV, 4:2:0 chroma subsampled. This encoding is widely compatible but produces large files. The file extension should be avi.
  • cv2.cv.CV_FOURCC('P','I','M','1'): This is MPEG-1. The file extension should be avi.
  • cv2.cv.CV_FOURCC('M','J','P','G'): This is motion-JPEG. The file extension should be avi.
  • cv2.cv.CV_FOURCC('T','H','E','O'): This is Ogg-Vorbis. The file extension should be ogv.
  • cv2.cv.CV_FOURCC('F','L','V','1'): This is Flash video. The file extension should be flv.

A frame rate and frame size must be specified, too. Since we are copying from another video, these properties can be read from our get() method of the VideoCapture class.

Capturing camera frames

A stream of camera frames is represented by the VideoCapture class, too. However, for a camera, we construct a VideoCapture class by passing the camera's device index instead of a video's filename. Let's consider an example that captures 10 seconds of video from a camera and writes it to an AVI file:

import cv2

cameraCapture = cv2.VideoCapture(0)
fps = 30 # an assumption
size = (int(cameraCapture.get(cv2.cv.CV_CAP_PROP_FRAME_WIDTH)),
        int(cameraCapture.get(cv2.cv.CV_CAP_PROP_FRAME_HEIGHT)))
videoWriter = cv2.VideoWriter(
    'MyOutputVid.avi', cv2.cv.CV_FOURCC('I','4','2','0'), fps, size)

success, frame = cameraCapture.read()
numFramesRemaining = 10 * fps - 1
while success and numFramesRemaining > 0:
    videoWriter.write(frame)
    success, frame = cameraCapture.read()
    numFramesRemaining -= 1

Unfortunately, the get() method of a VideoCapture class does not return an accurate value for the camera's frame rate; it always returns 0. For the purpose of creating an appropriate VideoWriter class for the camera, we have to either make an assumption about the frame rate (as we did in the code previously) or measure it using a timer. The latter approach is better and we will cover it later in this chapter.

The number of cameras and their ordering is of course system-dependent. Unfortunately, OpenCV does not provide any means of querying the number of cameras or their properties. If an invalid index is used to construct a VideoCapture class, the VideoCapture class will not yield any frames; its read() method will return (false, None).

The read() method is inappropriate when we need to synchronize a set of cameras or a multi-head camera (such as a stereo camera or a Kinect). Then, we use the grab() and retrieve() methods instead. For a set of cameras:

success0 = cameraCapture0.grab()
success1 = cameraCapture1.grab()
if success0 and success1:
    frame0 = cameraCapture0.retrieve()
    frame1 = cameraCapture1.retrieve()

For a multi-head camera, we must specify a head's index as an argument to retrieve():

success = multiHeadCameraCapture.grab()
if success:
    frame0 = multiHeadCameraCapture.retrieve(channel = 0)
    frame1 = multiHeadCameraCapture.retrieve(channel = 1)

We will study multi-head cameras in more detail in Chapter 5, Detecting Foreground/Background Regions and Depth.

Displaying camera frames in a window

OpenCV allows named windows to be created, redrawn, and destroyed using the namedWindow(), imshow(), and destroyWindow() functions. Also, any window may capture keyboard input via the waitKey() function and mouse input via the setMouseCallback() function. Let's look at an example where we show frames of live camera input:

import cv2

clicked = False
def onMouse(event, x, y, flags, param):
    global clicked
    if event == cv2.cv.CV_EVENT_LBUTTONUP:
        clicked = True

cameraCapture = cv2.VideoCapture(0)
cv2.namedWindow('MyWindow')
cv2.setMouseCallback('MyWindow', onMouse)

print 'Showing camera feed. Click window or press any key to stop.'
success, frame = cameraCapture.read()
while success and cv2.waitKey(1) == -1 and not clicked:
    cv2.imshow('MyWindow', frame)
    success, frame = cameraCapture.read()

cv2.destroyWindow('MyWindow')

The argument to waitKey() is a number of milliseconds to wait for keyboard input. The return value is either -1 (meaning no key has been pressed) or an ASCII keycode, such as 27 for Esc. For a list of ASCII keycodes, see http://www.asciitable.com/. Also, note that Python provides a standard function, ord(), which can convert a character to its ASCII keycode. For example, ord('a') returns 97.

Tip

On some systems, waitKey() may return a value that encodes more than just the ASCII keycode. (A bug is known to occur on Linux when OpenCV uses GTK as its backend GUI library.) On all systems, we can ensure that we extract just the ASCII keycode by reading the last byte from the return value, like this:

keycode = cv2.waitKey(1)
if keycode != -1:
    keycode &= 0xFF

OpenCV's window functions and waitKey() are interdependent. OpenCV windows are only updated when waitKey() is called, and waitKey() only captures input when an OpenCV window has focus.

The mouse callback passed to setMouseCallback() should take five arguments, as seen in our code sample. The callback's param argument is set as an optional third argument to setMouseCallback(). By default, it is 0. The callback's event argument is one of the following:

  • cv2.cv.CV_EVENT_MOUSEMOVE: Mouse movement
  • cv2.cv.CV_EVENT_LBUTTONDOWN: Left button down
  • cv2.cv.CV_EVENT_RBUTTONDOWN: Right button down
  • cv2.cv.CV_EVENT_MBUTTONDOWN: Middle button down
  • cv2.cv.CV_EVENT_LBUTTONUP: Left button up
  • cv2.cv.CV_EVENT_RBUTTONUP: Right button up
  • cv2.cv.CV_EVENT_MBUTTONUP: Middle button up
  • cv2.cv.CV_EVENT_LBUTTONDBLCLK: Left button double-click
  • cv2.cv.CV_EVENT_RBUTTONDBLCLK: Right button double-click
  • cv2.cv.CV_EVENT_MBUTTONDBLCLK: Middle button double-click

The mouse callback's flags argument may be some bitwise combination of the following:

  • cv2.cv.CV_EVENT_FLAG_LBUTTON: The left button pressed
  • cv2.cv.CV_EVENT_FLAG_RBUTTON: The right button pressed
  • cv2.cv.CV_EVENT_FLAG_MBUTTON: The middle button pressed
  • cv2.cv.CV_EVENT_FLAG_CTRLKEY: The Ctrl key pressed
  • cv2.cv.CV_EVENT_FLAG_SHIFTKEY: The Shift key pressed
  • cv2.cv.CV_EVENT_FLAG_ALTKEY: The Alt key pressed

Unfortunately, OpenCV does not provide any means of handling window events. For example, we cannot stop our application when the window's close button is clicked. Due to OpenCV's limited event handling and GUI capabilities, many developers prefer to integrate it with another application framework. Later in this chapter, we will design an abstraction layer to help integrate OpenCV into any application framework.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset