The task of motion detection that was presented in the previous section is a relatively easy process, especially for simple scenes. The real challenge appears when we actually have to estimate the motion between two images; that is, come up with a motion vector that gives us a way to transform the first frame to the second and vice versa.
The motion vector usually comprises two numbers (or coordinates); one showing the length of the motion in pixels, r, and one showing the direction of the motion in degrees, θ. This pair of coordinates is called polar. An equivalent way to portray the motion of a pixel is by defining the length of the motion in pixels, in the vertical and horizontal direction. These coordinates are called cartesian. In the example of the following figure, you can see all the coordinates needed to describe the motion of a pixel moving from point (x1,y1) = (0,0) to point (x2,y2) = (4,3).
The task of accurate motion estimation is a very complicated one and can also be deemed impossible when the video used includes a mixture of occlusions, background motion, multiple moving objects, brightness variations, shadows, camera motion, and so on. Indulging in such complex problems is beyond the scope of this book, so we will stick to easy problems with acceptable solutions.
A very popular way to estimate motion in a video is by using optical flow algorithms. Optical flow is a widely researched area of computer vision and several algorithms, each with their own pros and cons, have been proposed. Its ultimate goal is to use spatiotemporal information from the frame sequence of a video to estimate motion vectors between consecutive pairs of frames. The specifics of how these algorithms achieve their final goal are too technical for our purposes. Here, we will demonstrate the usage of two of them in the Computer Vision System Toolbox of MATLAB, so that you get an idea of what they can do. The optical flow algorithms that are included in the toolbox are the ones by Horn-Schunck and Lucas-Kanade.
The optical flow method by Horn and Schunck is described in: B. K. Horn and B. G. Schunck, Determining optical flow, Artificial intelligence, vol. 17, no. 1–3, pp. 185–203, 1981.
The method by Lucas and Kanade can be found in: B. D. Lucas and T. Kanade, An iterative image registration technique with an application to stereo vision, in Proceedings of the 7th international joint conference on Artificial intelligence, 1981.