One of the main advantages of using the GPU to perform computations in images is that they are much faster. This increase in speed allows you to run heavy computational algorithms in real-time applications, such as stereo vision, pedestrian detection, or dense optical flow. In the next matchTemplateGPU
example, we show an application that matches a template in a video sequence:
#include <iostream> #include "opencv2/core/core.hpp" #include "opencv2/highgui/highgui.hpp" #include "opencv2/features2d/features2d.hpp" #include "opencv2/gpu/gpu.hpp" #include "opencv2/nonfree/gpu.hpp" using namespace std; using namespace cv; int main( int argc, char** argv ) { Mat img_template_cpu = imread( argv[1],IMREAD_GRAYSCALE); gpu::GpuMat img_template; img_template.upload(img_template_cpu); //Detect keypoints and compute descriptors of the template gpu::SURF_GPU surf; gpu::GpuMat keypoints_template, descriptors_template; surf(img_template,gpu::GpuMat(),keypoints_template, descriptors_template); //Matcher variables gpu::BFMatcher_GPU matcher(NORM_L2); //VideoCapture from the webcam gpu::GpuMat img_frame; gpu::GpuMat img_frame_gray; Mat img_frame_aux; VideoCapture cap; cap.open(0); if (!cap.isOpened()){ cerr << "cannot open camera" << endl; return -1; } int nFrames = 0; uint64 totalTime = 0; //main loop for(;;){ int64 start = getTickCount(); cap >> img_frame_aux; if (img_frame_aux.empty()) break; img_frame.upload(img_frame_aux); cvtColor(img_frame,img_frame_gray, CV_BGR2GRAY); //Step 1: Detect keypoints and compute descriptors gpu::GpuMat keypoints_frame, descriptors_frame; surf(img_frame_gray,gpu::GpuMat(),keypoints_frame, descriptors_frame); //Step 2: Match descriptors vector<vector<DMatch>>matches; matcher.knnMatch(descriptors_template,descriptors_frame,matches,2); //Step 3: Filter results vector<DMatch> good_matches; float ratioT = 0.7; for(int i = 0; i < (int) matches.size(); i++) { if((matches[i][0].distance < ratioT*(matches[i][1].distance)) && ((int) matches[i].size()<=2 && (int) matches[i].size()>0)) { good_matches.push_back(matches[i][0]); } } // Step 4: Download results vector<KeyPoint> keypoints1, keypoints2; vector<float> descriptors1, descriptors2; surf.downloadKeypoints(keypoints_template, keypoints1); surf.downloadKeypoints(keypoints_frame, keypoints2); surf.downloadDescriptors(descriptors_template, descriptors1); surf.downloadDescriptors(descriptors_frame, descriptors2); //Draw the results Mat img_result_matches; drawMatches(img_template_cpu, keypoints1, img_frame_aux, keypoints2, good_matches, img_result_matches); imshow("Matching a template", img_result_matches); int64 time_elapsed = getTickCount() - start; double fps = getTickFrequency() / time_elapsed; totalTime += time_elapsed; nFrames++; cout << "FPS : " << fps <<endl; int key = waitKey(30); if (key == 27) break;; } double meanFps = getTickFrequency() / (totalTime / nFrames); cout << "Mean FPS: " << meanFps << endl; return 0; }
The explanation of the code is given as follows. As detailed in Chapter 5, Focusing on the Interesting 2D Features, features can be used to find the correspondence between two images. The template image, which is searched afterwards within every frame, is processed in the first place using the GPU version of SURF (gpu::SURF_GPU surf;
) to detect interest points and extract descriptors. This is accomplished by running surf(img_template,gpu::GpuMat(),keypoints_template, descriptors_template);
. The same process is performed for every frame taken from the video sequence. In order to match the descriptors of both images, a GPU version of the BruteForce matcher is also created with gpu::BFMatcher_GPU matcher(NORM_L2);
. An extra step is needed due to the fact that interest points and descriptors are stored in the GPU memory, and they need to be downloaded before we can show them. That's why surf.downloadKeypoints(keypoints, keypoints);
and surf.downloadDescriptors(descriptors, descriptors);
are executed. The following screenshot shows the example running:
The principal motivation for choosing GPU programming is performance. Therefore, this example includes time measurements to compare the speedups obtained with respect to the CPU version. Specifically, time is saved at the beginning of the main loop of the program by means of the getTickCount()
method. At the end of this loop, the same method is used as well as getTickFrequency
, which helps to calculate the FPS of the current frame. The time elapsed in each frame is accumulated, and at the end of the program, the mean is computed. The previous example has an average latency of 15 FPS, whereas the same example using CPU data types and algorithms achieves a mere 0.5 FPS. Both examples have been tested on the same hardware: a PC equipped with an i5-4570 processor and an NVIDIA GeForce GTX 750 graphics card. Obviously, a speed increment of 30x is significant, especially when we just need to change a few lines of code.