The rise of digital media is greater than ever. People are placing more and more of their personal information on digital carriers like laptops, smartphones, and tablets. However, many of these systems do not provide efficient identification and authentication systems to ensure that strangers cannot access your personal data. This is where biometrics-based identification systems come into play and try to make your data more secure and less vulnerable to malicious people.
These identification systems can be used to lock down your computer, avoid people getting into a secure room, and so on, but, with technology improving each day, we are only one step away from further digitalizing our personal lives. How about using your facial expressions to unlock your door? How about opening your car with your fingerprint? The possibilities are endless.
Many techniques and algorithms are already available in open source computer vision and machine learning packages like OpenCV to efficiently use these personal identification properties. Of course, this opens up the possibility for enthusiastic computer vision programmers to create many different applications based on these techniques.
In this chapter, we will focus on techniques that use individual biometrics in order to create personal authentication systems that outperform standard available login systems based on passwords. We will take a deeper look at iris and fingerprint recognition, face detection, and face recognition.
We will first discuss the main principles behind each biometric technique, and then we'll show an implementation based on the OpenCV 3 library. For some of the biometrics, we will make use of the available open source frameworks out there. All datasets used to demonstrate the techniques are available for free online for research purposes. However, if you want to apply them to a commercial application, be sure to check their licenses!
Finally, we will illustrate how you can combine several biometric classifications to increase the chance of successfully identifying a specific person based on the probability of the individual biometrics.
At the end of this chapter, you will be able to create a fully functional identification system that will help you to avoid your personal details being stolen by any malicious party out there.
The general idea behind identifying a person using a biometric property is the same for all biometrics out there. There are several steps that we should follow in the correct order if we want to achieve decent results. Moreover, we will point out some major points inside these general steps that will help you improve your recognition rate with extreme measures.
The key to most biometric identification systems is to collect a system training dataset that is representative of the problem for which you will actually use the system. Research has proven that there is something called dataset bias, which means that if you train a system on a training set with a specific setup, environmental factors, and recording devices, and then apply that system to a test set which has been taken from a completely different setup with different environmental factors (like lighting sources) and different recording devices, then this will produce a decrease in performance of up to 25%. This is a very large setback in performance, since you want to make sure that your identification system runs with top performance.
Therefore, there are several things to consider when creating your training set for your identification system:
How to apply this normalization for specific techniques will be discussed in the corresponding subtopics; for example, in the case of face recognition, since it can actually depend a lot on the techniques used. Once you get a good training set, with sufficient samples, you are ready to move to the second step.
Keep in mind that there will be cases where applying constraints is not always a good way to go. Consider a laptop login system based on biometric features that only works with the lights on like face detection and recognition. That system would not work when somebody was working in a dark room. In that case, you would reconsider your application and ensure that there were enough biometric checks irrelevant to the changing light. You could even check the light intensity yourself through the webcam and disable the face check if you could predict that it would fail.
The simplification of the application and circumstances involves simplifying the algorithm discussed in this chapter, leading to better performance in these constrained scenarios.
Once you get the required training data to build your biometric identification system, it is important to find a way to uniquely describe each biometric parameter for each individual. This description is called a "unique feature vector" and it has several benefits compared to the original recorded image:
Again, how you construct the feature descriptor depends on which biometric you want to use to authenticate. Some approaches are based on Gabor filter banks, local binary pattern descriptions, and keypoint descriptors such as SIFT, SURF, and ORB. The possibilities are, again, endless. It all depends on getting the best description for your application. We will make suggestions for each biometric, but a more exhaustive search will need to be done to find the best solution for your application.
Each feature vector created from step 2 needs to be unique to ensure that a machine learning technique based on these feature vectors can differentiate between the biometrics of different test subjects. Therefore, it is important to have a descriptor with enough dimensions. Machine learning techniques are way better at separating data in high dimensional spaces than humans are, while they fail at separating data at low dimension feature spaces, for which a human brain outperforms the system.
Selecting the best machine learning approach is very cumbersome. In principle, different techniques offer similar results, but getting the best one is a game of trial and error. You can apply parameter optimization inside each machine learning approach to get even better results. This optimization would be too detailed for this chapter. People interested in this should take a deeper look at hyper parameter optimization techniques.
Some interesting publications about this hyper parameter optimization problem can be found below:
There are many machine learning techniques in OpenCV 3. Some of the most frequently used techniques can be found below in order of complexity:
If you are interested in using neural networks for your classification problems, then take a look at this OpenCV documentation page:
http://docs.opencv.org/master/d0/dce/classcv_1_1ml_1_1ANN__MLP.html
Once you have a machine learning technique that outputs a classification for your input feature vector, you need to retrieve a certainty. This certainty is needed to be sure how certain a classification result is. For example, if a certain output has a match for both entry 2 and entry 5 in a database, then you will need to use the certainty to be sure of which of the two matches you should continue with.
Here, it is also important to think about how your authentication system will operate. It can either be a one-versus-one approach, where you match each database entry with your test sample until you get a high enough matching score, or a one-versus-all approach, where you match the complete database, then look at the retrieved score for each match and take the best match possible.
One-versus-one can be seen as an iterative version of one-versus-all. They usually use the same logic; the difference is in the data structure used during the comparison. The one-versus-all approach requires a more complex way of storing and indexing the data, while one-versus-one uses a more brute-force approach.
Imagine an input test query for your system. Using one-versus-one matching, you would stop analyzing the database when you had a high enough match. However, if further down the road there was a match yielding an even higher score, then this one would be discarded. With the one-versus-all approach, this could be avoided, so in many cases it is better to apply this one-versus-all approach.
To give an example of which approach to use in a given case, imagine a door to a secret lab. If you want to check if a person is allowed to enter the lab, then a one-versus-all approach is required to make sure that you match all database entries and that the highest matching score has a certainty above a certain threshold. However, if this final secret lab door is only used to select who is already allowed to enter the room, then a one-versus-one approach is sufficient.
In order to avoid problems with two individuals who have very similar descriptors for a single biometric, multiple biometrics are combined to reduce the occurrence of false positive detections. This will be discussed further at the end of this chapter, when we combine multiple biometrics in an effective authentication system.