Now that we know how to match keypoints, let's go ahead and see how we can stitch multiple images together. Consider the following image:
Let's say we want to stitch the following image with the preceding image:
If we stitch these images, it will look something like the following one:
Now let's say we captured another part of this house, as seen in the following image:
If we stitch the preceding image with the stitched image we saw earlier, it will look something like this:
We can keep stitching images together to create a nice panoramic image. Let's take a look at the code:
import sys import argparse import cv2 import numpy as np def argument_parser(): parser = argparse.ArgumentParser(description='Stitch two images together') parser.add_argument("--query-image", dest="query_image", required=True, help="First image that needs to be stitched") parser.add_argument("--train-image", dest="train_image", required=True, help="Second image that needs to be stitched") parser.add_argument("--min-match-count", dest="min_match_count", type=int, required=False, default=10, help="Minimum number of matches required") return parser # Warp img2 to img1 using the homography matrix H def warpImages(img1, img2, H): rows1, cols1 = img1.shape[:2] rows2, cols2 = img2.shape[:2] list_of_points_1 = np.float32([[0,0], [0,rows1], [cols1,rows1], [cols1,0]]).reshape(-1,1,2) temp_points = np.float32([[0,0], [0,rows2], [cols2,rows2], [cols2,0]]).reshape(-1,1,2) list_of_points_2 = cv2.perspectiveTransform(temp_points, H) list_of_points = np.concatenate((list_of_points_1, list_of_points_2), axis=0) [x_min, y_min] = np.int32(list_of_points.min(axis=0).ravel() - 0.5) [x_max, y_max] = np.int32(list_of_points.max(axis=0).ravel() + 0.5) translation_dist = [-x_min,-y_min] H_translation = np.array([[1, 0, translation_dist[0]], [0, 1, translation_dist[1]], [0,0,1]]) output_img = cv2.warpPerspective(img2, H_translation.dot(H), (x_max-x_min, y_max-y_min)) output_img[translation_dist[1]:rows1+translation_dist[1], translation_dist[0]:cols1+translation_dist[0]] = img1 return output_img if __name__=='__main__': args = argument_parser().parse_args() img1 = cv2.imread(args.query_image, 0) img2 = cv2.imread(args.train_image, 0) min_match_count = args.min_match_count cv2.imshow('Query image', img1) cv2.imshow('Train image', img2) # Initialize the SIFT detector sift = cv2.SIFT() # Extract the keypoints and descriptors keypoints1, descriptors1 = sift.detectAndCompute(img1, None) keypoints2, descriptors2 = sift.detectAndCompute(img2, None) # Initialize parameters for Flann based matcher FLANN_INDEX_KDTREE = 0 index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5) search_params = dict(checks = 50) # Initialize the Flann based matcher object flann = cv2.FlannBasedMatcher(index_params, search_params) # Compute the matches matches = flann.knnMatch(descriptors1, descriptors2, k=2) # Store all the good matches as per Lowe's ratio test good_matches = [] for m1,m2 in matches: if m1.distance < 0.7*m2.distance: good_matches.append(m1) if len(good_matches) > min_match_count: src_pts = np.float32([ keypoints1[good_match.queryIdx].pt for good_match in good_matches ]).reshape(-1,1,2) dst_pts = np.float32([ keypoints2[good_match.trainIdx].pt for good_match in good_matches ]).reshape(-1,1,2) M, mask = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC, 5.0) result = warpImages(img2, img1, M) cv2.imshow('Stitched output', result) cv2.waitKey() else: print "We don't have enough number of matches between the two images." print "Found only %d matches. We need at least %d matches." % (len(good_matches), min_match_count)
The goal here is to find the matching keypoints so that we can stitch the images together. So, the first step is to get these matching keypoints. As discussed in the previous section, we use a keypoint detector to extract the keypoints, and then use a Flann based matcher to match the keypoints.
You can learn more about Flann at http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.192.5378&rep=rep1&type=pdf.
The Flann based matcher is faster than Brute Force matching because it doesn't compare each point with every single point on the other list. It only considers the neighborhood of the current point to get the matching keypoint, thereby making it more efficient.
Once we get a list of matching keypoints, we use Lowe's ratio test to keep only the strong matches. David Lowe proposed this ratio test in order to increase the robustness of SIFT.
You can read more about this at http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf.
Basically, when we match the keypoints, we reject the matches in which the ratio of the distances to the nearest neighbor and the second nearest neighbor is greater than a certain threshold. This helps us in discarding the points that are not distinct enough. So, we use that concept here to keep only the good matches and discard the rest. If we don't have sufficient matches, we don't proceed further. In our case, the default value is 10. You can play around with this input parameter to see how it affects the output.
If we have a sufficient number of matches, then we extract the list of keypoints in both the images and extract the homography matrix. If you remember, we have already discussed homography in the first chapter. So if you have forgotten about it, you may want to take a quick look. We basically take a bunch of points from both the images and extract the transformation matrix.
Now that we have the transformation, we can go ahead and stitch the images. We will use the transformation matrix to transform the second list of points. We keep the first image as the frame of reference and create an output image that's big enough to hold both the images. We need to extract information about the transformation of the second image. We need to move it into this frame of reference to make sure it aligns with the first image. So, we have to extract the translation information and then warp it. We then add the first image into this and construct the final output. It is worth mentioning that this works for images with different aspect ratios as well. So, if you get a chance, try it out and see what the output looks like.