The Accelerate framework can be very useful for performance optimization, especially if your application intensively does some vector and matrix math, or signal and image processing. We will learn how to link the framework and process OpenCV matrices with it.
All the source code changes will be localized in the CvEffects
library, so you can use the same Recipe15_DetectingFacialFeatures
project. Again, you can't use Simulator, as we're going to use the camera.
In the previous recipe, we have profiled the FaceAnimator
class and slightly improved the facial feature detection time by tuning the parameters. But the very first preprocessing step was still quite expensive, and we're going to optimize it using the Accelerate framework. In fact, we could work on a custom NEON optimization as before, but Accelerate could be a good time saver, as it provides a wide set of optimized functions for image processing. We will replace cv::cvtColor
and cv::equalizeHist
with calls to Accelerate functions. Histogram equalization helps the detection algorithm to better tolerate illumination changes.
The following are the steps required to accomplish the task:
cvtColor_Accelerate
and equalizeHist_Accelerate
.FaceAnimator::PreprocessToGray
method with the new FaceAnimator::PreprocessToGray_optimized
method that addresses calls to the optimized functions.Let's implement the described steps:
Processing.hpp
header file:// Accelerate-optimized functions int cvtColor_Accelerate(const cv::Mat& src, cv::Mat& dst, cv::Mat buff1, cv::Mat buff2); int equalizeHist_Accelerate(const cv::Mat& src, cv::Mat& dst);
Processing_Accelerate.cpp
file to the CvEffects
project, and insert the following code in it:#include <Accelerate/Accelerate.h> #include <opencv2/core/core.hpp> using namespace cv; int cvtColor_Accelerate(const Mat& src, Mat& dst, Mat buff1, Mat buff2) { vImagePixelCount rows = static_cast<vImagePixelCount>(src.rows); vImagePixelCount cols = static_cast<vImagePixelCount>(src.cols); vImage_Buffer _src = { src.data, rows, cols, src.step }; vImage_Buffer _dst = { dst.data, rows, cols, dst.step }; vImage_Buffer _buff1 = { buff1.data, rows, cols, buff1.step }; vImage_Buffer _buff2 = { buff2.data, rows, cols, buff2.step }; const int16_t matrix[4 * 4] = { 77, 0, 0, 0, 150, 0, 0, 0, 29, 0, 0, 0, 0, 0, 0, 0 }; int32_t divisor = 256; vImage_Error err; err = vImageMatrixMultiply_ARGB8888(&_src, &_buff1, matrix, divisor, NULL, NULL, 0 ); err = vImageConvert_ARGB8888toPlanar8(&_buff1, &_dst, &_buff2, &_buff2, &_buff2, 0); return err; } int equalizeHist_Accelerate(const Mat& src, Mat& dst) { vImagePixelCount rows = static_cast<vImagePixelCount>(src.rows); vImagePixelCount cols = static_cast<vImagePixelCount>(src.cols); vImage_Buffer _src = { src.data, rows, cols, src.step }; vImage_Buffer _dst = { dst.data, rows, cols, dst.step }; vImage_Error err; err = vImageEqualization_Planar8( &_src, &_dst, 0 ); return err; }
FaceAnimator
class. For that, add the members accbuffer1
and accBuffer2
of the cv::Mat
type to the class' declaration, and add the following method to the implementation file:void FaceAnimator::PreprocessToGray_optimized(Mat& frame) { grayFrame_.create(frame.size(), CV_8UC1); accBuffer1_.create(frame.size(), frame.type()); accBuffer2_.create(frame.size(), CV_8UC1); cvtColor_Accelerate(frame, grayFrame_, accBuffer1_, accBuffer2_); equalizeHist_Accelerate(grayFrame_, grayFrame_); }
void FaceAnimator::detectAndAnimateFaces(Mat& frame) { TS(Preprocessing); //PreprocessToGray(frame); PreprocessToGray_optimized(frame); TE(Preprocessing); ...
First of all, you should see that Accelerate uses the vImage_Buffer
structure as an image container. This structure is quite simple, but more importantly, it can be created on top of the existing OpenCV matrix. We don't have to copy or convert data, and that allows to seamlessly interleave calls to OpenCV and Accelerate, without any performance penalty. The following is how we initialize vImage_Buffer
using cv::Mat
data:
vImagePixelCount rows = static_cast<vImagePixelCount>(src.rows); vImagePixelCount cols = static_cast<vImagePixelCount>(src.cols); vImage_Buffer _src = { src.data, rows, cols, src.step };
Unfortunately, Accelerate doesn't provide color space conversions, and we have to use the generic vImageMatrixMultiply_ARGB8888
transformation function. So, cvtColor_Accelerate
is implemented in two steps, we first convert the RGBA input matrix to another four-channel matrix, first channel of which is the required grayscale image. Then we split the resulting matrix into four planes, and later use only the first one.
It should be noted that vImageMatrixMultiply_ARGB8888
actually uses fixed-point arithmetic, and the numbers in the matrix
variable are RGBA to Gray conversion coefficients, multiplied by 256. That's why we use 256 to initialize the divisor
:
Y = 0.299R + 0.587G + 0.114B ≈ (77R + 150G + 29B) / 256
After the conversion, we use vImageConvert_ARGB8888toPlanar8
to get the first channel with the image intensity data.
Implementation of the equalizeHist_Accelerate
is much more straightforward. We simply call the vImageEqualization_Planar8
function, and use its result directly.
As a final note, the Accelerate framework (in contrast to OpenCV) wants the user to allocate all the input and output buffers manually, and of course to deallocate this memory later. That's why we call the Mat::create
method for three image buffers in the PreprocessToGray_optimized
method. You shouldn't be afraid of slow memory reallocations on every frame, as OpenCV doesn't recreate matrices if they are already in the desired format.
We used only three functions from the Accelerate framework, but there are many more of them. Please refer to the official documentation if you want to know more: http://bit.ly/3848_Accelerate. You will see that many primitives from OpenCV's core
and imgproc
modules can be found there. Despite the fact that Accelerate's syntax is somewhat noisy, the use of this framework could be a cheaper solution, than to manually optimize every function with NEON. You should also note that Accelerate not only tries to exploit CPU (with NEON extensions), but also Digital Signal Processor (DSP), so it could provide a better speedup than a manually vectorized code.