Face detection techniques are used today in many practical and commercial applications. We can cite video surveillance, bio-metric security, robotics, medicine, classification and indexing of images and videos, human-machine interfaces, etc. It is also the first step in more complex systems such as face recognition or emotion detection.
These complex systems are used by a large number of industries such as automotive, retail, product marketing, banking/finance, home automation, and security.
Thus, the actual performance of a complex system is often dependent on the performance of the individual building blocks that comprise the large systems.
Understanding and improving the performance of face detection techniques is therefore particularly important for the implementation of the complex applications that use it as a building block.
It may appear that the simplest method to assess the system performance is to test it in the field. However, once one plans to do such comprehensive testing it becomes clear that it is an expensive proposition to conduct such testing. Additionally, it is also not possible to cover all different scenarios in a short amount of time. Thus, there is a need for using some sort of an automated system to quantify system robustness. The Pi has filled up this gap by creating a simulator called the ‘Pi Simulator’.
What are we talking about when we talk about the performance of a face detection system?
In-depth performance analysis is called robustness testing. We can look at the performance of a face detection system from two aspects: performance in terms of speed of execution, or performance in terms of its effectiveness or the success rate.
The first case of testing the system speed is fairly easy to solve by sizing the computing capabilities according to the constraints of the application. For example, Apple has used dedicated electronics to make its Face ID system capable of real-time operation. In other cases, we will use the capabilities of the Cloud to reserve computing power when it is needed. In other cases, finally, real-time detection may not be necessary and may be enough to use a low-performance system in terms of computing power. The computing power often is not a bottleneck for face detection systems. Thus, our focus has to be on the second type of performance evaluation i.e. the success rate.
The performance in terms of success rate, or the system robustness, is more complex to quantify because it depends on the quality of the detection algorithm itself. This is why it is necessary to characterize these performances for different solutions available and to consider how to improve them.
What are the real-world performances of the different face detection engines?
Face detection is a complex issue because the practical cases are innumerable: different facial orientations, the presence of distinctive signs (glasses, mustaches ...), differences in light conditions, climate, number of people in the image etc.
Thus, there are significant differences between various libraries and APIs available in the market, differences that are disparate depending on the particular cases.
In order to precisely characterize these differences, the Pi Innovate has developed a method and a tool, the Pi Simulator, which allows it to plot performance curves according to a dozen parameters for any face detection library or API.
For example, consider the Figure 2a shown below.
From the above image, the Pi Simulator creates a series of degraded images according to different parameters such as blur, brightness, climate, and noise. For example, the Pi Simulator creates the following type of images (from a much larger set):
These different sets of degraded images are then tested with the different face detection libraries and their success rate is plotted as a curve.
In the following figures, typical robustness curves are shown, and the above images are positioned according to their level of brightness, blur or rain.
As can be seen, the different algorithms generally have comparable performance under normal conditions, and widely disparate when conditions deviate from normal. This is the main reason why one needs to conduct experiments and understand the robustness of each algorithm under varying conditions.
The algorithm in blue is better in the case of low light, the algorithm in green is better for the fuzzy images, and the algorithm in red is better to detect faces in the rain.
Why such differences?
These face recognition engines are based on machine learning. This learning is done from a training data set, and the choice of this dataset partly determines the performance of the system. In other words, if it has been trained on images reproducing "normal" conditions, then this system will be optimized for these conditions.
Moreover, when building a machine learning system, one must make a choice between a very specialized engine that will, therefore, work only under restricted conditions, but with very good performance and a more generic engine that will work in most conditions, but whose performance will not be optimal in all cases.
What are the consequences?
Since the performance of each system differs under varying operating conditions, it is therefore complicated to build a system capable of detecting faces with a good level of reliability in all conditions, because of the disparity of robustness between the algorithms according to the conditions.
It is time consuming, tedious, and expensive to fully characterize the robustness performance of the different algorithms available according to the actual conditions that will be encountered by the vision system.
Thus, in most cases the algorithm used is actually chosen more or less "blindly" and the real system performance is known only when it is used and is sometimes below what was expected. Such an approach is detrimental to gaining customer confidence.
How to improve the system performance?
Starting from the above findings, two methods appear to improve the robustness performance of a face detection system:
1. The first method precisely characterizes the real conditions of use and helps designers to select the algorithm best suited to the largest number of cases. This solution improves performance but probably leaves special cases untreated.
2. Use a dynamic system capable of detecting image conditions and dynamically selecting the algorithm best suited for these conditions.
The second approach has two advantages. One is that on the one hand, it does not require preparatory work; on the other hand, it covers a large number of cases and dynamically selects the best algorithm.
However, it requires an additional technological brick: the system for detecting the conditions of the image. This is why the Pi Innovate is developing this technology brick and will deliver it to the market in the next couple of months.