The goal of this chapter is to study how to reconstruct a scene in 3D by inferring the geometrical features of the scene from camera motion. This technique is sometimes referred to as structure from motion. By looking at the same scene from different angles, we will be able to infer the real-world 3D coordinates of different features in the scene. This process is known as triangulation, which allows us to reconstruct the scene as a 3D point cloud.
In the previous chapter, you learned how to detect and track an object of interest in the video stream of a webcam, even if the object is viewed from different angles or distances, or under partial occlusion. Here, we will take the tracking of interesting features a step further and consider what we can learn about the entire visual scene by studying similarities between image frames. If we take two pictures of the same scene from different angles, we can use feature matching or optic flow to estimate any translational and rotational movement that the camera underwent between taking the two pictures. However, in order for this to work, we will first have to calibrate our camera.
The complete procedure involves the following steps:
This chapter has been tested with OpenCV 2.4.9 and wxPython 2.8 (http://www.wxpython.org/download.php). It also requires NumPy (http://www.numpy.org) and matplotlib (http://www.matplotlib.org/downloads.html). Note that if you are using OpenCV3, you may have to obtain the so-called extra modules from https://github.com/Itseez/opencv_contrib and install OpenCV3 with the OPENCV_EXTRA_MODULES_PATH
variable set in order to get SURF installed. Also note that you may have to obtain a license to use SURF in commercial applications.
The final app will extract and visualize structure from motion on a pair of images. We will assume that these two images have been taken with the same camera, whose internal camera parameters we know. If these parameters are not known, they need to be estimated first in a camera calibration process.
The final app will then consist of the following modules and scripts:
chapter4.main
: This is the main function routine for starting the application.scene3D.SceneReconstruction3D
: This is a class that contains a range of functionalities for calculating and visualizing structure from motion. It includes the following public methods:__init__
: This constructor will accept the intrinsic camera matrix and the distortion coefficientsload_image_pair
: A method used to load from the file, two images that have been taken with the camera described earlierplot_optic_flow
: A method used to visualize the optic flow between the two image framesdraw_epipolar_lines
: A method used to draw the epipolar lines of the two imagesplot_rectified_images
: A method used to plot a rectified version of the two imagesplot_point_cloud
: This is a method used to visualize the recovered real-world coordinates of the scene as a 3D point cloud. In order to arrive at a 3D point cloud, we will need to exploit epipolar geometry. However, epipolar geometry assumes the pinhole camera model, which no real camera follows. We need to rectify our images to make them look as if they have come from a pinhole camera. For that, we need to estimate the parameters of the camera, which leads us to the field of camera calibration.