Chapter 23. Future of OpenCV

Past and Present

OpenCV was launched in August 1999 at the Computer Vision and Pattern Recognition conference (and so turns 17 years old at the publication of this book). Gary Bradski founded OpenCV at Intel with the intention to accelerate both the research and use of real applications of computer vision in society. Few things in life go according to their original plan, but OpenCV is one of those few. As of this writing, OpenCV has nearly 3,000 functions, has had 14 million downloads, is trending well above 200,000 downloads per month, and is used daily in millions of cell phones, recognizing bar codes, stitching panoramas together, and improving images through computational photography. OpenCV is at work in robotics systems—picking lettuce, recognizing items on conveyor belts, helping self-driving cars see, flying quad-rotors, doing tracking and mapping in virtual and augmented reality systems, helping unload trucks and pallets in distribution centers, and more—and is built into the Robotics Operating System (ROS). It is used in applications that promote mine safety, prevent swimming pool drownings, process Google Maps and streetview imagery, and implement Google X robotics, to name a few examples.

Since the previous edition of this book, OpenCV has been re-architected from C to modern, modular C++ compatible with STL and Boost. The library has been brought up to modern software development standards with distributed development on Git, continuous build bots, Google unit tests, comprehensive documentation, and tutorials. OpenCV was intended to be cross-platform from the beginning when it spanned Windows, Linux, and Mac OS X. It continues active support for these desktop OSes, but now also covers mobile with Android and iOS versions. It has optimized versions for Intel architectures, ARM, GPU, NVidia GPUs, and Movidius chips but also works with Xylinx Zync FPGAs. In addition to efficient C++ source code, it has extensive interfaces in Python (compatible with NumPy), Java, and MATLAB.

OpenCV has also added a new, independent section maintained by users. In that repository,  all routines are standalone and follow OpenCV style and documentation, as well as pass the Buildbot tests. With opencv_contrib, OpenCV keeps up with the latest algorithms and applications in computer vision; see Appendix B for a snapshot of the directory’s contents.

OpenCV 3.x

OpenCV started as a purely C library, and version 1.0 focused mostly on building useful algorithmic content. OpenCV 2.0’s main focus was on bringing the library up to modern C++ development standards, including the move to Git, Google-style unit tests, compatibility with STD, and of course a C++ interface. All new development has been in C++, but the older C functions were just wrapped in C++. Along the way, complete interfaces in Python, Java, and MATLAB were added.

OpenCV 3.0 focuses on modularity; it is written entirely in native C++ so that only one code base needs to be maintained. Computer vision’s increasing success has led to a problem that there are too many potentially useful algorithms to be maintained in one monolithic code base. OpenCV 3.x solved that problem by keeping a strong supported core and turning everything else into easy-to-create and easier-to-maintain small, independent modules that may be mixed and matched as desired. More and more computer vision students and research groups are releasing new algorithms built on OpenCV data structures. OpenCV 3.x makes it easy for them to produce a module complete with documentation, unit tests, and example code that can be easily linked into OpenCV (or not).

OpenCV 3.x’s independent modules will also help cloud, embedded, and mobile applications by allowing for smaller, more focused computer vision memory footprints. One of the mission statements of OpenCV is to foster increasing use of computer vision in society; embedded vision devices will help spread the use of visual sensing in robotics, mobile, security, safety, inspection, entertainment, education, and automation. For such applications, memory use is a key consideration. On the other side, cloud computing also has memory constraints—as algorithms scale across large numbers of machines running a wide mix of jobs, memory use becomes a key bottleneck.

Our hope is that by making it easy to assemble a mix of independent modules, including perhaps one’s own module, OpenCV 3.x will not only enable the aforementioned areas but also foster something that may look like a “vision app store” in opencv_contrib. Such a collection of well-defined modules that plug directly into OpenCV will allow much wider and more creative uses of vision-enabled applications. External modules might be open, closed, free, or commercial, all aimed at allowing developers who know very little about vision to infuse vision capability into their applications.

How Well Did Our Predictions Go Last Time?

In the previous edition of this book, we made some predictions about OpenCV’s future. How did we do? We said that OpenCV would support robotics and 3D; this clearly came true. One of the authors, Gary, launched a robotics company, Industrial Perception Inc., that used OpenCV and 3D vision routines to allow robots to handle boxes in distribution centers. Google bought that company in 2013. At the same time, the other author, Adrian, ran many industry, government, and military robotics contract projects incorporating OpenCV while he was working at Applied Minds.

Calibration was forecast to be expanded and to include passive and active sensing. True to form, OpenCV now includes ArUco augmented reality markers and the combination of checkerboard and ArUco patterns so you no longer need to see the whole board, and multiple cameras can see different pieces of the same calibration pattern (see Appendix C). All these routines now exist to solve more challenging calibration and multicamera pose problems.

We predicted new 3D object recognition and pose recognition capabilities, and these were also integrated—from human-defined features in linemod to deep network 3D object recognition and pose. Indeed, opencv_contrib was itself predicted as a modular repository that would make user contribution much easier.

Most of the applications predicted in the previous book, from much better stereo vision algorithms to dense optical flow, have come true. Back then, we said that 2D features would be expanded and supported by an engine, all of which happened in features2D(), which covers a large percentage of the hand-crafted 2D point detectors and descriptors. Improved functionality with Google data structures is also under way. We also said that better support for approximate nearest neighbor techniques would be added, and it was with the incorporation of FLANN (Fast Library for Approximate Nearest Neighbor) into OpenCV. We have long since run developer workshops at computer vision conferences as outlined in the previous book. Finally, better documentation did finally show up (http://docs.opencv.org).

What were we wrong about? We did not yet get a more general camera interface for higher-bit or multispectral cameras. SLAM (Simultaneous Localization And Mapping) support is in, but not as a robust complete implementation. Bayesian networks were not pursued because deep networks outpaced them. We did not yet implement anything special for artists, but artists nevertheless have continued to expand the use of OpenCV.

Future Functions

This book has mentioned OpenCV’s past and detailed its present state. Here are some future directions:

Deep learning

OpenCV can already read and run networks such as Caffe, Torch, and Theano. This code is at https://github.com/opencv/opencv_contrib/tree/master/modules/cnn_3dobj. You can expect to see OpenCV integrate a full deep-learning module focused on running and training in embedded systems and smart cameras built around and expanding on an external code base called tiny_dnn.

Mobile

The growth in “computationally capable” cameras is still phenomenal. So, one obvious direction OpenCV will take is increasing support of mobile. This support includes algorithms as well as mobile hardware and mobile OS. OpenCV already has ports to iOS and Android, which we hope to support by allowing smaller static memory footprints.

Glasses

Augmented reality glasses that overlay the incoming scene with data and objects will be an increasingly supported area. Tracking the user’s head pose in 3D will also aid virtual reality localization within a room. Already, we’ve expanded ArUco AR tags to ChArUco (checkerboard with ArUco) that give a much more accurate pose. We have some contributors working on adding SLAM support for Google Cardboard.

Embedded apps

Embedded applications are also growing in importance and will become a whole new device area. Seeing this trend, Xilinx already has a port of OpenCV to its Zync architecture. We can expect to see vision showing up in a range of items, from toys to security devices, automotive applications, manufacturing uses, and unmanned vehicles on land, underwater, and in the air. OpenCV wants to help enable these developments.

3D

Depth sensors are under development by many companies and will increasingly show up in mobile. OpenCV has a growing number of dense-depth support routines, from computing fast normals, surface finding, and depth feature extraction to refinement.

Light field cameras

This is an area dating back to 1910 but having intense activity in the 1990s. We predict it will become increasingly popular, with cheaper cameras and embedded processors allowing lens arrays to capture wide multipoint views, apertures, and fields of view, perhaps using different lens configuration. Expect to see support for such cameras as they come into existence and get less expensive.

Robotics

All of the preceding features directly benefit robotics. New hardware, cheaper cameras, and radically more flexible robot arms, coupled with better planning and control algorithms, mark the start of a whole new industry in sensor-guided robotics. Several key contributors to OpenCV work in robotics, and you can expect to see continued growth in support of this area.

Cloud

Over time, expect to see support to make it easier to work across arrays of embedded cameras interoperating with servers running the same processing stack and tightly integrated with OpenCV, deep neural networks, graphics, optimization, and parallel capable libraries. There will be some effort to have this working seamlessly on commercial providers such as Amazon and Google servers using C++ or Python.

Online education

We would like to provide online courses that cover computer vision problem solving using OpenCV. We hope to expand our visibility at conferences and workshops and perhaps offer our own “things you need to know” conferences.

Current GSoC Work

For the last several years, Google has been kind enough, through its Google Summer of Code (GSoC) program, to support interns working over the summer on OpenCV. You may view a wiki page on these efforts. You can also view videos covering this new functionality at the following URLs:

In 2015, 15 interns were supported. This support has been invaluable both to the interns (many of whom go on to prominent positions in the field) and to OpenCV. The topics covered in 2015, almost all with accepted pull requests into OpenCV trunk, were:

Omnidirectional cameras calibration and stereo 3D reconstruction

opencv_contrib/ccalib module (Baisheng Lai, Bo Li)

Structure from motion

opencv_contrib/sfm module (Edgar Riba, Vincent Rabaud)

Improved deformable part-based models

opencv_contrib/dpm module (Jiaolong Xu, Bence Magyar)

Real-time multi-object tracking using kernelized correlation filter

opencv_contrib/tracking module (Laksono Kurnianggoro, Fernando J. Iglesias Garcia)

Improved and expanded scene text detection

opencv_contrib/text module (Lluis Gomez, Vadim Pisarevsky)

Stereo correspondence improvements

opencv_contrib/stereo module (Mircea Paul Muresan, Sergei Nosov)

Structured-light system calibration

opencv_contrib/structured_light module (Roberta Ravanelli, Delia Passalacqua, Stefano Fabri, Claudia Rapuano)

Chessboard + ArUco for camera calibration

opencv_contrib/aruco module (Sergio Garrido, Prasanna Krishnasamy, Gary Bradski)

Implementation of universal interface for deep neural network frameworks

opencv_contrib/dnn module (Vitaliy Lyudvichenko, Anatoly Baksheev) [this may be replaced by tiny-dnn in the future]

Recent advances in edge-aware filtering, improved SGBM stereo algorithm

opencv/calib3d and opencv_contrib/ximgproc modules (Alexander Bokov, Maksim Shabunin)

Improved ICF detector, Waldboost implementation

opencv_contrib/xobjdetect module (Vlad Shakhuro, Alexander Bovyrin)

Multitarget TLD tracking

opencv_contrib/tracking module (Vladimir Tyan, Antonella Cascitelli)

3D pose estimation using CNNs

opencv_contrib/cnn_3dobj module (Yida Wang, Manuele Tamburrano, Stefano Fabri)

As of the final editing of this book, the following 13 new algorithms are being worked on for GSoC 2016:

  • Adding tiny-dnn deep learning training and test functions into OpenCV (Edgar Riba, Yida Wang, Stefano Fabri, Manuele Tamburrano, Taiga Nomi, Gary Bradski)

  • Enhancing the existing dnn module to read and run Caffe models (Vludv, Anatoly Baksheev)

  • Better visual tracking, GOTURN tracker (Tyan Vladimir, Antonella Cascitelli)

  • Accurate, dynamic structured light (Ambroise Moreau, Delia Passalacqua)

  • Adding very fast, dense optical flow (Alexander Bokov, Maksim Shabunin)

  • Extending the text module with deep word-spotting CNN (Anguelos, Lluis Gomez)

  • Improvement of the dense optical flow algorithm (VladX, Ethan Rublee)

  • Multilanguage support in OpenCV tutorials: Python, C++, and Java (Carucho, Vincent Rabaud)

  • New image stitching pipeline (Jiri Horner, Bo Li)

  • Adding better file storage for OpenCV (Myls, Vadim Piarevsky)

Community Contributions

The OpenCV community has become much more active as well. During the time of GSoC 2015, the community contributed:

  • A plotting module (Nuno Moutinho)

  • Ni-black thresholding algorithm: ximgproc (Samyak Datta)

  • Superpixel segmentation using linear spectral clustering, SLIC superpixels: ximgproc (Balint Cristian)

  • HDF (HDF5) support module (Balint Cristian)

  • Depth to external RGB camera registration: rgbd (Pat O’Keefe)

  • Computing normals for a point cloud: rgbd (Félix Martel-Denis)

  • Fuzzy image processing (Pavel Vlasanek)

  • Rolling shutter guidance filter: ximgproc (Zhou Chao)

  • 3× faster SimpleFlow: optflow (Francisco Facioni)

  • Code and docs for CVPR 2015 paper “DNNs Are Easily Fooled” (Anh Nguyen)

  • Efficient graph-based image segmentation algorithm: ximgproc (Maximilien Cuony)

  • Sparse-to-dense optical flow: optflow (Sergey Bokov)

  • Unscented Kalman filter (UKF) and augmented UKF tracking (Svetlana Filicheva)

  • Fast Hough transform: ximgproc, xolodilnik

  • Improved performance of haartraining (Teng Cao)

  • Python samples made compatible with Python 3: bastelflp

We hope that Google and the community continue this great work!

OpenCV.org

In the time between the publication of the book’s previous edition and this one, OpenCV became a California nonprofit foundation aimed at advancing computer vision in general, promoting computer vision education, and providing OpenCV as a free and open infrastructure for furthering vision algorithms in particular. To date, the foundation has had support from Intel, Google, Willow Garage, and NVidia. In addition, DARPA (through Intel) provided funding for a “People’s Choice Best Paper” award at CVPR (Computer Vision And Pattern Recognition) 2015, and Intel has sponsored this contest to run again in 2016. The results from 2015 are available online. The winning entries resulted in several new algorithms hosted in the .../opencv_contrib directory. Prebuilt code for OpenCV can be downloaded from the user site, while raw code can be obtained from the developer site; see https://github.com/opencv/opencv for the core library and https://github.com/opencv/opencv_contrib for the user-contributed modules. The wiki for OpenCV is at https://github.com/opencv/opencv/wiki. There is also a Facebook page.

As the writing of this book comes to a close, the founding author, Gary Bradski, is in the process of turning OpenCV.org into a federal nonprofit 501(c)(3) corporation. Previously, OpenCV had no paid staff (beyond summer mentor stipends provided by Google), no office, and no equipment, and had been trying to pay out within the same year everything that came in. Now, there is an effort under way to turn OpenCV.org into a robust, full-featured nonprofit. This will involve bringing on some dedicated board members (unpaid), raising funds to support some paid full-time staff, developing educational materials and contests, putting on annual conferences that would emphasize new, useful vision solutions, providing in-depth training tutorials, sponsoring or at least supporting greater sensing and autonomy in robotics leagues, providing support and education for learning and using computer vision at the high school level, offering support and training for using computer vision in the artist community, and more.

We also hope to add more cooperation with OpenCV in China, founded by Prof. Riuzhen Liu. This is hosted at the Shanghai Academy of Artificial Intelligence (also known as AIV: Artificial Intelligence Valley), which is sponsored by the Chinese Academy of Science and Fudan University. It is a subscriber organization aiming to be an independent research organization focused on artificial intelligence, automation, intelligent device control, and pattern recognition. Over time, we hope to increase similar links to other organizations around the world.

If OpenCV.org can generate enough funding, it is possible that OpenCV can offer full-time phone and web support, develop courseware in vision and machine learning (possibly including partnering with manufacturers to provide compatible development kits), and certify vision developers who can be trusted to build applications in computer vision and deep learning perception. We may also develop a certification program for other camera functionality offered by partners where “Certified by OpenCV” can become a trusted brand. In so doing, we look forward to vastly expanding the reach and scope of OpenCV!

Some AI Speculation

We are clearly at a turning point in the development of artificial intelligence (AI). As of this writing, AlphaGo from Google’s Deep Mind group has beat the world champion, Lee Sedol, at the very difficult “spatial strategy” game Go. Using AI, robots are learning to drive, fly, walk, and manipulate objects. Meanwhile, AI technology is making speech, sound, music, and image recognition natural in our devices and across the Web. Silicon Valley has seen many “gold rushes” since the original one for real gold in 1848. The winners and losers in this new AI gold rush remain to be seen, but it is clear that the world will never be the same. In its function of accelerating progress in perception, OpenCV plays a role in this historical movement toward sentient (self-aware) machines.

It’s clear that deep neural networks have essentially solved the problem of feed-forward recognition of patterns (one can say that they are superb function approximators), but such networks are nowhere near sentient or “alive.” First, there is the problem of experience itself. We humans don’t just see, say, a color; we experience it subjectively. How this subjective experience arises is called the problem of “qualia.” Second, machines also don’t seem to ever really be autonomously creative. They can generate new things within an explicit domain, but they don’t invent new domains, nor actively drive experimentation and open-loop discovery.

What may be missing is “embodiment.” Humans and many robots have a model of their own being, their “self,” acting in the world. This self can be simulated in isolation for planning actions, but this simulated model of self is more often coupled to the world by sensors. Using this coupled model, the embodied mind gives causal meaning to the world (choosing where to walk, avoiding danger, observing consequences to its plan), and this gives the embodied mind a sense of meaning in relation to its model of itself.

We believe that such a world-coupled model of itself allows the AI to make metaphors [Lakoff08] that are used to generalize to later experience. When young, for example, humans experience putting things into and taking things out of containers. Later in life, this embodied experience informs what it means to be, say, “in” a garden; in other words, the early experience of playing with containers is used to generalize what it means to be in a garden. Causal experience of the model of self, coupled to the world, allows an entity to attach meaning to things. Such meaning stabilizes perception since categories don’t just come and go—they have causal and time-stable consequences to our simulated model of self within the world. It is complete speculation, but qualia, or subjective experience, may arise from simulating how our model of the world affects our simulated model of the self; that is, we experience our model’s simulated reaction to the world, not the world itself.

Stanley, the robot that won the $2 million prize for the 2005 DARPA Grand Challenge robot race across the desert, used many sensors such as GPS, accelerometers, gyros, laser range finders, and vision to sense the world and fused these sensed results into a computationally efficient “bird’s-eye view” world model. The model consisted of a tilted plane reflecting the general angle of the terrain that was then marked with drivable, un-drivable, and unknown regions derived from the sensor readings. In this world model, Stanley ran physics simulations of itself driving in the general direction of the next few GPS waypoints. The resulting paths were rated to find the most efficient path that would not tip the robot over. Stanley’s brain was sufficient to win the DARPA Grand Challenge, but consider what it wasn’t sufficient to do: it could not represent love, politics, astrophysics, or Shakespeare very well. If Stanley could ask us what a Shakespeare play meant, at best we could say it was something like the boundaries between the drivable and unknown areas in a difficult map. Stanley’s model of itself and its interaction with the world are too sparse for understanding most of the things in the world. It seems obvious that we ape-like beings that live mostly in low-lying temperate watersheds are similarly limited in our ultimate ability to even detect what we don’t know about the universe. In this way, “we are all Stanley.”

We humans find it pretty easy to understand something such as the need for food (a natural part of our model), but we find it extremely difficult to figure out how to create a more intelligent AI or to fathom what qualia is. As another example, if we raised a kitten and had it listen to Shakespeare all day until it was grown, we wouldn’t expect our grown cat to understand a sonnet. If we want to explain to the cat what a sonnet means, the best we could do is use a metaphor from the cat’s natural models, such as, “Shakespeare’s sonnet is like a kitten that inevitably gets lost in bad places.” The cat might think, “Now I understand,” but it has no means by which to even understand what it does not understand! Again, we humans must also be similarly limited. Perhaps we can build more powerful machines to which the problem of qualia is simple. But when we ask the machine to explain it to us, it might get flustered and then finally say, “Qualia is like a kitten that inevitably gets lost in bad places.”

In Stanley the robot, the nature of its perception is entirely in terms of its model. Stanley doesn’t perceive the world; instead, its cameras and sensors transduce signals that populate a causal model of the robot in the world. But the model is only like the real world in terms of the navigational needs of the robot car. By way of another example: in a laptop computer, you might see a GUI and conclude, for example, that there’s a trash can inside the computer since you see one on the screen. A more clever physicist might look closely at the screen and cry out, “Everything is made up of quanta” (pixels)! But, in fact, the GUI is only a causal model to a linear Von Neumann machine reality inside. What is real is that things dragged into the trash are erased—the causal consequences of the model are real. Again, we humans must be similarly limited in what we can know of our own universe, since we’ve inherited a causal model mainly directed at our direct physical and social experience. But our machines may see further.

Today, there is a lot of debate about the dangers of AI. People confuse their metaphors around this. They think the “AI,” the intelligence, is what drives the behaviors of the larger system. But look to ourselves. Our “programming language” as humans isn’t our intellect, but our moods and drives—our emotions! Stanley’s goal was to safely traverse GPS points as fast as possible in the correct order. It found an orderly following of GPS waypoints to be attractive and so that’s what it employed its intelligence to do. Our programming isn’t “thought,” it’s emotion. The emotions guide “what” to do; the intellect guides “how.” The same will be true of our future machines. Design the motives well, and the machines will pursue them.

Ah, but the reader may worry that perhaps those machines will alter the goals given to the next generation of machines? First of all, it will be no easier for a greater machine intelligence to understand and create the minds of a yet greater machine than it is for us to create the first generation. The problem just shifts upward. The intelligent machines also will face the same dangers to themselves from their next evolution of AI that we do from the first evolution, and they will tend to program goals and emotions accordingly. In the end, AIs will be consumed with their own goals, which have evolved from our goals as embodied beings. The danger from AIs is not from the possible malevolence of their goals, but from the nonhuman differences in their goals, which to us may seem like indifference or even hostility. This is what made H. P. Lovecraft’s alien monsters so interesting and frightening in his fiction—they were not so much malevolent as driven by wholly different motives and so wholly indifferent to our fate. We, for example, give little thought to the lives of ants and thereby sometimes bring them to harm.

However, and for the record, the authors don’t fear AI, but rather see it as absolutely essential to solve many of humanity’s vexing problems, such as providing reliable health care for all people, ending hunger, curing diseases, providing and maintaining new energy generation and storage techniques while protecting the environment, and helping run our ever-more-complex world. Rather than a threat, in AI we see purpose. We would not want the spark of self-aware intelligent life to die out with our world, but instead would rather see intelligence grow outward in space and in time. This, in a way, may be (or could be chosen to be1) humanity’s ultimate purpose, and so is also, hereby, an indirect purpose of OpenCV!

Afterword

We’ve covered a lot of theory and practice in this book, and we’ve described some of the plans for what comes next. Of course, as we’re developing the software, the hardware is also changing. Cameras are now increasingly cheaper and more capable and have proliferated from cell phones to traffic lights and into factory and home monitoring. A group of manufacturers are aiming to develop cell phone projectors—perfect for robots, because most cell phones are lightweight, low-energy devices whose circuits already include an embedded camera. This opens the way for close-range portable structured light and thereby accurate detailed depth maps, which are just what we need, together with the development of light field cameras for robot manipulation and 3D object scanning.

Both authors participated in creating the vision system for Stanley, Stanford’s robot racer that won the 2005 DARPA Grand Challenge. In that effort, a vision system coupled with a laser range scanner worked flawlessly for the seven-hour desert road race [Dahlkamp06]. For us, this drove home the power of combining vision with other perception systems: we converted the previously unsolved problem of reliable road perception into a solvable engineering challenge by merging vision with other forms of perception. It is our hope that—by making vision easier to use and more accessible through this book—others can add vision to their own problem-solving tool kits and thus find new ways to solve important problems. That is, with commodity camera hardware, cheap embedded processors, and OpenCV, people can start solving real problems, such as using stereo vision as an automobile backup safety system (or to make automotive improvements in general), monitoring all rail lines for people and vehicles on the tracks, implementing swimming safety measures, building new game controls, developing new security systems, and so on. Be sure to keep an eye on tiny-dnn, a fully featured deep net library with a focus on embedded computing in opencv_contrib. Finally: get hacking!

Computer vision has a rich future ahead, and it seems likely to be one of the key enabling technologies for the 21st century. OpenCV seems likely to be (at least in part) one of the key enabling technologies for computer vision. Endless opportunities for creativity and profound contribution lie ahead. We hope that this book encourages, excites, and enables all who are interested in joining the vibrant computer vision community!

1 There is a sort of spiritual or religious sense to such thoughts, but they have an interesting inversion to “old time” religion. In the past, it was felt that a God chose a people and a purpose. In this new sense, people choose a purpose, which may result in a God-like AI (Google’s algorithms and servers have what is to us almost omniscient knowledge, for example). In this sense, we move from a chosen people who hark back to some glorious past from a diminished present, to a choosing people who look out from a diminished present to a glorious future.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset