11.2. Functional Groups: What's Good for What

In the Windows directory hierarchy, the OpenCV manual is located at C: Program FilesOpenCVdocsindex. htm. In the manual, the functions are broken up into the following groups:

M1.  Basic Structures and Operations
     Helper structures
     Array structures
     Arrays manipulation
     Matrix operations
     Dynamic data structures
     Sequences
     Sets
     Graphs
     Writing and reading structures
M2.  Image Processing and Analysis
     Drawing functions
     Gradients
     Edges and corners sampling
     Interpolation and geometrical transforms
     Morphological operations
     Filters and color conversion
     Pyramids and the applications
     Connected components
     Image and contour moments
     Special image transforms
     Histogram recognition functions
     Utility functions
M3.  Structural Analysis
     Contour processing functions
     Geometric comparison functions
     Planar subdivisions (triangulation)
M4.  Motion Analysis and Object Tracking
     Accumulation of background statistics
     Motion templates
     Object tracking
     Optical flow
     Estimators: Kalman, condensation
M5.  Object Recognition
     Eigen objects (PCA)
     Hidden Markov models
M6.  Camera Calibration and 3D Reconstruction
M7.  Experimental Functionality
     Statistical boosting
     Stereo correspondence
     3D tracking with multiple cameras
M8.  GUI and Video Acquisition
     Easy user interface creation
     Image I/O, display and converstion
     Video I/O functions
     WaitKey/Init system/AddSearchPath
M9.  Bibliography
M10. CvCam Camera I/O Manual
     Exit Camera
     GetCamerasCount
     GetProperty
     Init
     Pause
     PlayAVI
     Resume
     SelectCamera
     SetProperty
     Start
     Stop

In Section 11.2.1, we first give an overview of the functions by area in the manual. In Section 11.2.2, we give some brief ideas of what function might be good for what task. Section 11.2.3 discusses what's in the demos and sample code.

11.2.1. By area

Below, M# refers to the manual sections outlined above.

Ml: Structures and matrices

The first section of the manual begins by describing structures for representing subpixel accurate points and rectangles. This is followed by array and image structures and functions for creating, releasing, copying, setting, and other data manipulations of vector or matrix data. Image and matrix logic, arithmetic, and conversion are unified and described here along with basic statistical measures such as sums, averages, standard deviations, minimums, maximums, and norms.

OpenCV contains a full set of linear algebra routines optimized for small and typical image-sized matrices. Examples of these functions are matrix multiplication, dot products, cross products, transpose, inversion, singular value decomposition (SVD), eigenimages, covariance, Mahalanobis distance, matrix log, power and exponential, Cartesian-to-polar conversion and back, and random matrices.

Dynamic structures such as linked lists, queues, and sets designed to work with images are described. There are also a full set of graph and tree structures such as that support Delaunay triangulation. This chapter ends with functions that support reading and writing of structures.

M2: Image processing

The second chapter of the manual covers a wide variety of image processing and analysis routines. It starts with a basic set of line, conic, poly, and text drawing routines, which were included to help in real-time labeling and debugging. Next, gradient, edge finding, and corner detection routines are described. OpenCV allows some useful sampling functions, such as reading pixels from an arbitrary line in an image into a vector and extracting a subpixel-accurate rectangle or quadrangle (good for rotation) from an image.

A full set of morphological operations [37] on image objects are supported along with other basic image filtering, thresholding, integral (progressive sum images), and color conversion routines. These are joined by image pyramids, connected components, and standard and gradient-directed floodfills. For rapid processing, you may find and process gray-level or binary contours of an image object.

A full range of moment-processing routines are supported, including normal, spatial, central, normalized central, and Hu moments [23]. Hough [26] and distance transforms are also present.

In computer vision, histograms of objects have been found very useful for finding, tracking, and identifying objects, including deforming and articulating objects. OpenCV provides a large number of these types of operations, such as creating, releasing, copying, setting, clearing, thresholding, and normalizing multidimensional histograms. Statistical operations on histograms are allowed, and most of the ways of comparing two histograms such as correlation, Chi-square, intersection [38, 35], and earth-mover's distance [34, 33, 32] are supported. Pairwise geometrical histograms are covered in manual section M3. In addition, you can turn histograms into probability densities and project images into these probability spaces for analysis and tracking. The chapter ends with support for most of the major methods of comparing a template to image regions, such as normalized cross correlation absolute difference.

M3: Structural analysis

Once gray-level or binary-level image object contours are found, many operations allow you to smooth, simplify, and compare contours between objects. These contour routines allow rapid finding of polynomial approximations to objects, bounding boxes, area of objects, boundary lengths, and shape matching.

Image geometry routines allow you to fit lines, boxes, minimum enclosing circles, and ellipses to data points. This is where you'll find routines like KMeans, convex hulls, and convexity defect analysis, along with minimum area rotated rectangles. Also in this manual section is support for 2D, pair-wise, geometrical histograms, as described in [24]. This chapter ends with routines that fully support Delaunay triangulation.

M4: Motion analysis and tracking

This chapter starts with support for learning the background of a visual scene in order to segment objects by background differencing. Objects segmented by this or any other method may then be tracked by converting successive segmentations over time into a motion history image (MHI) [17, 13]. Routines can take the gradient of MHIs to further find global and segmented motion regions.

The chapter then moves on to object tracking, first covering tracking of probability regions in images via the mean-shift and CamShift algorithms [11]. Tracking by energy-minimizing snakes [27] is also supported. Next, four methods of tracking by optical flow are discussed using Horn and Schunck's algorithm [22], Lucas and Kanade [30], block matching, and the recommended way, Lucas and Kanade in image pyramids [8].

This chapter concludes with two key tracking algorithms, Kalman filter and condensation tracker, based on particle filtering. There is an excellent tutorial on Kalman tracking at [41]. For condensation, see http://www.dai.ed.ac.uk/CVonline/LOCAL_COPIES/ISARD1/condensation.html.

M5: Object recognition

This chapter covers two key techniques in object recognition: eigenobjects and hidden Markov models (HMMs). There are many other recognition techniques in OpenCV, from histogram intersection [36] in M2 to boosted classifers in M7. The HMMs in this section allow HMMs to feed into other HMMs. One of the OpenCV demos uses horizontal HMMs feeding into a vertical HMM, termed embedded HMM (eHMM), to recognize faces.

M6: Camera calibration and 3D

OpenCV supports a full set of functions for doing intrinsic (internal camera parameters and lens distortions) and extrinsic (cameras location with respect to the outside world or other cameras) camera calibration. After calibration, functions can be called to undistort a lens or to track objects in 3D (see below). Most of these techniques were developed in [43, 45, 44, 21, 10, 9], but we added routines for finding and tracking checkerboard corners in order to help fill the matrices needed for calibration. In addition, there are routines for calculating the homography matrix, finding the fundamental matrix, and finding the epipolar lines between two images. Using calibrated, epipolar-aligned images allows us to do view morphing; that is, to synthesize a new object view as a weighted linear combination of the two existing views. Other techniques for stereo correspondence and multicamera tracking are supported in manual chapter M7.

This chapter further supports tracking objects in 3D. One way of doing this is to track a checkerboard object with a calibrated camera. More general ways of tracking four or more noncoplaner points on an object using weak-strong perspective iteration (POSIT algorithm) [18] are detailed.

M7: Recent experimental routines

The experimental routines would better be titled "recent routines." This chapter includes support for AdaBoost for face (or other object) detection. Routines support both boosted learning and recognition of objects. Also included are routines for finding stereo correspondence; we tried to combine the best of the faster methods to get good, dense, but fast correspondence. Note that to do stereo, you should align the cameras as parallel as possible so that the views cover as much of the same scene as possible. Monocular areas not covered by both cameras can cause shearing when the foreground object moves into the monocular area. Finally, routines supporting tracking objects using multiple (two or more) cameras are described.

M8: GUI and video I/O

This manual section covers image and video input and display from disk or cameras (for Windows or Linux) that are contained in the HighGUI library. Windows Install places source code for this in C:Program FilesOpenCV otherlibshighgui.

Display is covered first, with functions that allow you to put up a window and display video or images there. You can also attach sliders to the window, and they can be set to control processing parameters that you set up. Functions supporting full mouse events are also available to make interacting with images and video easy. A series of functions follow that handle reading and writing images to disk as well as common image conversions.

The next part of manual section M8 discusses functions for video I/O from either disk or camera. Writing AVIs may be done with various types of compression, including JPEG and MPEG1.

M9: Bibliography

This manual section contains a short bibliography for OpenCV, although many citations are placed directly in the function descriptions themselves.

M10: cvcam camera I/O manual

This last section of the manual is actually a submanual devoted to single or multiple camera control and capture under Windows or Linux. The library for this chapter under Windows is placed in the directory C : Program Files OpenCVotherlibscvcam.

11.2.2. By task

This section provides suggestions regarding what functions might be good for a few of the popular vision tasks. Note that any of these functions may operate over multiple scales by using the image pyramid functions described in manual section M2.

Camera calibration, stereo depth maps

Camera calibration is directly supported by OpenCV, as discussed in Section 11.2.1. There are routines to track a calibration checkerboard to subpixel accuracy and to use these points to find the camera matrix and lens distortion parameters. This may then be used to mathematically undistort the lens or find the position of the camera relative to the calibration pattern. Two calibrated cameras may be put into stereo correspondence via a routine to calculate the fundamental matrix and another routine to find the epipolar lines.

From there, the "experimental section" of the manual, M7, has fast routines for computing stereo correspondence and uses the found correspondence to calculate a depth image. As stated before, it is best if the cameras are aligned as parallel as possible with as little as possible monocular area left in the scene.

Background subtraction, learning, and segmentation

There are dozens of ways that people have employed to learn a background scene. Since this is a well-used and often effective hack, I'll detail a method even if it is not well supported by OpenCV. A survey of methods may be found in [39]. The best methods are long-term and short-term adaptive. Perhaps the best current method is described in Elgammal et al.'s 2000 paper [19], in which not the pixel values themselves, but the distribution of differences, is adaptively learned using kernel estimators. We suspect that this method could be further improved by using linear-predictive adaptive filters, especially lattice filters, due to their rapid convergence rates.

Unfortunately, the above techniques are not directly supported in OpenCV. Instead, there are routines for learning mean and variance of each pixel (see "Accumulation of Background Statistics" in manual section M4). Also inside OpenCV, we could instead use the Kalman filter to track pixel values, but the normal distribution assumption behind the Kalman filter does not fit well with the typical bimodal distribution of pixel values over time. The condensation particle filter may be better at this, though at a computational cost for sampling. Within OpenCV, perhaps the best approach is just to use k-means with two or three means incrementally adapted over a window of time as each new pixel value comes in, using the same short-long scheme as employed by Elgammal et al. in [19].

Once a background model has been found, a thresholded absolute difference cvAbsDiff of the model with the current frame yields the candidate foreground regions. The candidate regions are indicated in a binary mask image where on = foreground candidate, off = definite background. Typically, this is a grayscale, 8-bit image so that "on" is a value of 255 and "off" is a value of zero.

Connected components

The candidate foreground region image above will be noisy, and at the image will be filled with pixel "snow." To clean it up, spurious single pixels need to be deleted by performing a morphological erode followed by a morphological dilate operation. This operation is called morphological open and can be done in one shot using the OpenCV function cvMorphologyEx with a 3 x 3 pixel, cross-shaped structuring element and the enumerated value CV_SHAPE_CROSS, and performing the open operation using CV_MOP_OPEN with one iteration.

Next, we want to identify and label large connected groups of pixels, deleting anything "too small." The candidate foreground image is scanned and any candidate foreground pixel (value = 255) found is used as a flood-fill seed start point to mark the entire connected region using the OpenCV cvFloodFill function. Each region will be marked by a different number by setting the cvFloodFill newVal to the new fill value. The first found region is marked with 1, then 2, and so on up to a max of 254 regions. In cvFloodFill, the to and up values should be set to zero, a CvConnectedComp structure should be passed to the function, and flags should be set to 8. Once the region is filled, the area of the fill is examined (it is set in CvConnectedComp). If the area filled is below a minimum-area-size threshold Tsize, that area is erased by flooding it with a new value of zero, newVal = 0 (regions that are too small are considered noise). If the area is greater than Tsize, then it is kept and the next fill value is incremented subject to it being less than 255.

Getting rid of branch movement, camera jitter

Since moving branches and slight camera movement in the wind can cause many spurious foreground candidate regions, we need a false detection suppression routine such as described in pages 6 to 8 of Elgammal et. al's paper [19]. Every labeled candidate foreground pixel i has its probability recalculated by testing the pixel's value against each of its neighboring pixel's probability distribution in a 5 x 5 region around it, N5×5. The maximum background probability calculation is assigned to that foreground pixel:

Equation 11.1


where Bj is the background sample for the appropriate pixel j in the 5 × 5 neighborhood. If the probability of being background is greater than a threshold then that candidate foreground pixel is labeled as background. But, since this would knock out too many true positive pixels, we also require that the whole connected region C also be found to be probabilistically a part of the background:

Equation 11.2


A former candidate foreground pixel is thus demoted to background if

Equation 11.3


where and are suppression thresholds.

Stereo background subtraction

This is much like the above, except we get a depth map from two or more cameras. The basic idea is that you can also statistically learn the depth background. In detection mode, you examine as foreground only those regions that are in front of your known background. The OpenCV camera calibration and stereo correspondence routines are of great help there (manual sections M6 and M7).

People tracking

There are innumerable ways to accomplish this task. One approach is to fit a full physical model to a person in a scene. This tends to be slow and is not supported in OpenCV, so it is not discussed further. For multiple cameras, we can look to the experimental section of the manual, M7, where multiple tracked areas are put into 3D correspondence. Other typical methods of tracking whole people are to sample a histogram (M2, histogram functions) or template (M2, matchtemplate) from a person, then scan through the future frames, back-projecting histogram intersections, earth-mover distances, or template-match scores to create a probability of person image. The mean-shift or CamShift algorithms (M4) can then track the recognition peaks in these images. Alternatively, people can be represented as a sequence of vertically oriented color blobs, as done in [42]. This could be accomplished in OpenCV by use of cvKMeans2, described in M3, to cluster colors, or by using the image statistical functions described in the array statistics section of Ml, or by using the undocumented texture descriptors in the experimental section (look for "GLMC" in cvaux.h). A "probabilities of match" image can be made this way and tracked by mean shift as above.

Another people-tracking approach, if you can get adequate background-foreground segmentation, is to use the motion history templates and motion gradients, as described in manual section M4.

Face finding and recognition

The good way to perform face finding and face recognition is by the Viola-Jones method, as described in [40] and fully implemented in OpenCV (see section M7 in the manual and the accompanying demo in the apps section of the OpenCV directory).

Face recognition may be done either through eigenimages or embedded HMMs (see manual section M5), both of which have working demos in the apps directories.

Image retrieval

Image retrieval is typically done via some form of histogram analysis. This is fully supported via the histogram learning and comparison functions described in manual section M2.

Gesture recognition for arcade

Perhaps the best approach to gesture recognition, if you can get adequate background-foreground segmentation, is to use the motion history templates and motion gradients, as described in manual section M4. For recognition, depending on your representation, you may use the histogram functions described in M2 or Mahalanobis distances in M1, or you may do some sort of eigen trajectories using eigenobjects in M5. If hand shape is also required, you can represent the hand as a gradient histogram and use the histogram recognition techniques in M2.

The calibration, stereo, and/or 3D tracking routines described in M6 and M7 can also help segment and track motion in 3D. HMMs from section M5 can be used for recognition.

Part localization, factory applications

OpenCV was not built to support factory machine vision applications, but it does have some useful functionality. The camera calibration (manual section M6) and stereo routines (M7) can help with part segmentation. Templates can be compared with the MatchTemplate function in M2. Lines may be found with the Canny edge detector or Hough transform in M2.

There are routines for the subpixel accurate location of corners, rectangles, and quadrangles. Where to move can be aided by distance transforms. Adaptive thresholds can help segment parts, and pyramid operations can do things over multiple scales, all in M2.

Part shapes can be analyzed and recognized through the extensive collection of contour operators in M2. Motion can be analyzed and compensated for using optical-flow routines in M4. Kalman and condensation particle filters for smoothing, tracking, or predicting motion are supported as described in M4.

Flying a plane

Autonomous or semiautonomous planes are popular now for sport and military applications. A plane can be flown knowing only the horizon line. Assume that a camera has been installed such that the direction of heading is the exact center of the image and that level flight corresponds to the horizontal image scan lines being tangent to the earth's curvature. It then turns out that knowing the horizon line is enough to fly the plane. The angle of the horizon line is used for roll control, and the perpendicular distance from the line to the center of the image tells the pitch of the plane.

All we need then is a fast, robust way of finding the horizon line. A simple heurstic for finding this line was developed by Scott Ettinger [20]. Basically, we find the line through the image that minimizes the variance on both sides of the line (sky is more like sky than ground, and vice versa). This may be simply done every frame by creating an image pyramid (in manual section M2) of the aerial scene. On a much reduced scale image, we find the variance above and below the horizontal line through the center of the image using image (array) statistics (Ml). We then systematically move and rotate the line until variance is minimized on both sides of the line. We can then advance to a larger or full scale to refine the line angle and location. Note that every time we move the line, we need not recalculate the variance from scratch. Rather, as points enter a side, their sum is added to that side's variance and lost points are subtracted.

OCR

The Hough transform (HoughLines) for lines in manual section M2 can be used to find the dominant orientation of text on a page. The machinery used for statistically boosted face finding described in M7 could also be used for either finding individual letters or finding and recognizing the letters. A Russian team created extremely fast letter recognition by thresholding letters, finding contours (cvFindContours, M2), simplifying the contours representing the contours as trees of nested regions and holes, and matching the trees as described in the Contour Processing Functions in M3.

The author has had quite good results from using the embedded HMM recognition techniques described in manual section M5 on text data. There are demos for both these routines (but applied to faces, not text) included in the apps directory of OpenCV.

11.2.3. Demos and samples

The demo applications that ship with OpenCV are discussed next.

CamShiftDemo, cvcsdemo

This is a statistically robust probability distribution mode tracker. It is based on making the tracking window of the mean-shift algorithm dynamic-this supports visual objects that can resize themselves by moving within the visual field. But this means that if your probability distributions are not compact (e.g., if they diffuse all over the image), CamShift will not work and you should switch to just mean-shift. For the demo, the distribution that CamShift tracks is just the probability of color that you selected from the video stream of images.

CamShiftDemo is the same as cvcsdemo, but uses a tcl interface.

LKDemo, cvlkdemo

This is a real-time Lucas-Kanade in the image-pyramid tracking demo. Note that Lucas-Kanade is a window-based tracker, and windows are ambiguous at object boundaries. Thus, windows on a boundary may tend to slide or stick to the the background or foreground interior. The LKDemo is the same as cvlkdemo; the latter just uses a tcl interface.

HaarFaceDetect, HaarTraining

This is a slight modification to the Viola-Jones AdaBoost face detector, which used Haar-type wavelets as weak feature detectors. Training code (but not the raw face database), trained parameters, and working real-time face detection are included here.

Hawk

Hawk is a window-based interactive C scripting system for working with OpenCV functions. It uses EiC interpretive C as its engine-some of High-GUI grew out of this. See other interfaces, Section 11.5, for what has replaced this earlier interface.

HMMDemo

This is a working HMM-based face detection demo complete with a sample database to train on.[2] This demo has HMMs across the face feeding into a vertical HMM down the face, which makes the final decision. You may add to the database by live camera. The HMM technique works well, except we give it uniform priors (uniformly cut up the image) to initialize training. If faces are not precisely aligned, the facial features will be blurred. This structural blurring leaves lighting as a stronger, though accidental, feature, and so this application tends to be lighting-sensitive. Putting actual eye, nose, and mouth priors into the model would probably minimize lighting dependence, but we have not tried this yet.

[2] Some early members of the Russian OpenCV development team formed the training database images.

StereoDemo

This is a console-based stereo-depth calculation application that uses cvcam for video capture and HighGUI for displaying the images. See the readme file in ... apps StereoDemo for instructions on how to run this application. You will need two USB cameras that are compatible with DirectShow and that can run together. Creative WebCam is an example of such a camera. Two different cameras might be also okay.

This application allows automatic calibration of the cameras via tracking a checkerboard and then running to develop the disparity/depth image.

Tracker3dDemo

This demo uses two or more cameras calibrated together to track blobs in 3D.

VMDemo, vmdemotk

This demo, complete with sample images in the Media subdirectory, uses the epipolar lines between two calibrated views to morph (interpolate) camera views anywhere between two views of an object. VMDemo and vmdemotk are the same except that the latter uses tcl and the former uses DirectShow filters.

Sample Code

On the Windows Install, the sample code can be found at C: Program Files OpenCVsamplesc. The same directory also contains some test images that some of the sample code operates on.

The sample codes are just simple examples of the following routines:

squares.c- Uses contours to find colored rotated squares.
pyramid_segmentation.c- Uses image pyramids for color segmentation.
motempl.c- Using motion templates to track motion.
morphology.c- Uses morphological operators.
laplace.c- Use of Laplace operator on image.
kmeans.c- Find clusters by K-means algorithm.
kalman.c- Track by the Kalman filter.
fitellipse.c- Best fit of ellipse to data.
ffilldemo.c- Use of floodfill.
facedetect.c- Uses AdaBoost-based face detector.
edge.c- Uses Canny operator to edges in an image.
drawing.c- Demos the drawing functions.
distrans.c- Demos the distance transform function.
DemHist.c- Demos use of several of the Histogram functions.
delaunay.c- Performs Delaunay triangulation of an image.
convexhull.c- Finds convex hull of a set of points.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset