***Content written in this blog by Elaine Lin from ZKTeco***
The development of computer vision recognition has greatly benefited from recent advancements in data storage, computing power, and algorithm execution. Further contributing to its rapid development is the broad range of hardware and software technologies available, including Arm 32-bit & 64-bit Neon instructions and Arm 64-bit SIMD instructions. We leverage these technologies to further enhance the performance of our near-infrared & visible light hybrid biometric matching platform, which consists of various facial, palm, vehicle features, and iris recognition devices.
Following the Knowledge Distillation method, we optimized our facial recognition technology for an Arm-based embedded platform. Knowledge Distillation method consists of training a larger facial feature extraction network with high accuracy, followed by designing a smaller network that meets the operational speed requirements. The small network learns from the large network maintaining accuracy. By significantly reducing the number of parameters, we dramatically increase the performance by up to 20x. We minimize network parameters’ storage to a quarter of their original size through the training quantification technique, which uses int8 type data for parameter storage. Many embedded inference frameworks such as TFLite, NCNN, MNN, TNN, Tengine, and so on, can support int8 data-type and provide high and efficient performance when combined with the Arm processor's Neon SIMD instructions.
By combining these methodologies and embedding Arm-based general-purpose application processors in our devices, we have achieved extremely fast-matching high-precision facial recognition. Which can be used in various business applications, including time & attendance, access control, and entrance control systems.
Facial recognition technology provides people fast, secure, and convenient access to physical barriers and electronic devices, including employee time clocks. However, facial recognition technology is sometimes not acceptable to customers having privacy concerns. In this case, an excellent alternative to facial recognition is near-infrared and visible-light palm recognition technology. Palm recognition provides as much fast, secure, and convenient access while not having any privacy concerns since people’s palm prints are seldom found in the public domain.
Deployed on the Arm embedded platform, we have developed near-infrared palm and visible-light palm recognition systems that can be used either separately or in combination. The combined “bimodal” configuration is highly competitive with existing face recognition systems in terms of both precision and accuracy while not raising privacy concerns.
Figure 1. Palm vein image
We first trained a palm detection model using an improved RetinaNet algorithm to detect the palm position and to capture 9-key points from the image. Then following 9-key points detection, we utilize affine transformation to align the palm image to the standard dimension of 224 x 196. This image is then inputted to the feature extraction network. To increase the speed, we use a half-width MobileNetV2 as the feature extraction network. We train this network using hundreds of thousands of palm vein images to extract the features efficiently. Finally, we fabricated various fake palm prostheses. We utilize these fake prosthetic images as negative samples and utilize real palm images as positive samples. This method effectively “trains” an anti-spoofing model, which prevents prostheses attacks once the palm recognition device is deployed in the commercial market.
We have achieved great success in performing neural network inferences on the MNN framework with optimized SIMD instructions on AArch64. The entire process can run within 100ms on a typical Arm chip, such as MT6739 (Arm Cortex-A53).
Visible-light palm recognition utilizes the uniqueness and lifelong print texture patterns on each palm.
Figure 2. Visible-light palm image
The visible-light recognition of palms is similar to near-infrared recognition where the algorithm detects the outline first, and then aligns and extracts the features from the image. We included an anti-spoofing network in between the process. Since the visible-light palm is more easily disturbed by Ambient Light when the palm posture changes, the design of the network structure has more variations.
We optimized the hybrid palm recognition system for various business applications, including time & attendance, access control, and entrance control. The combination of the two algorithms dramatically improves the recognition accuracy and overall performance of the whole system.
Transportation services have become more intelligent. With the rapid development of deep-learning technology in recent years, deep neural networks have become the most important practical tool to perform complex visual tasks. For example, vehicle characterization, vehicle detection, vehicle tracking. However, today's deep neural networks are highly complex and demand high computational and storage capacity. These demanding requirements have limited the performance capability of deep-learning models on embedded devices.
However, we can solve this problem by optimizing the deep learning model. The optimization can effectively reduce the number of model parameters and computing workload, making it fit for real-world deployed devices. We implement this deep-learning-based vehicle feature recognition on Hi3516AV200 (CPU for Arm architecture) to demonstrate the effects brought about by the model optimization.
Figure 3. Deployment of a vehicle feature recognition model on Hi3516AV200, Arm Cortex-A7
The key vehicle features include logo, model, and body color. The vehicle feature recognition model is trained based on the Darknet framework: a C language open-source framework with excellent portability. Deep learning models trained by Darknet port well to Arm devices to perform the relevant vision tasks.
The following principles guide the optimization of vehicle recognition tasks:
1) Ensure that the recognition accuracy meets the requirements of the task.
2) Compress the network parameters and reduce the model size by adjusting the network structure and the number of convolutional cores.
3) Ensure that the model works in real-time.
Model
Size (MB)
Parameter (million)
Speed (fps)
Preciseness (%)
Initial
59.7
14.925
4
99.5
Optimized
0.58
0.145
17
99.4
Table 1. Comparison of models before and after optimization.
We have fine-tuned the network structure for network parameters and made large model size and computational optimizations to adapt to the computing power and memory of Arm devices while maintaining the accuracy of the recognition.
See the following the effect of the vehicle feature recognition model for the real-world application:
Figure 4. Arm-based vehicle feature recognition demo
As seen in Table 1 and Figure 4, our optimized model is easily deployable to Arm devices and achieves high-performance results.
When someone says, “you have beautiful eyes” they’re probably referring to your iris. It sits between the white sclera and the pupil. The iris includes as much as 65% of texture information about the eye, despite only occupying 55% of the eye’s surface area. The iris consists of numerous crypts, wrinkles, and pigmented spots, and is unique. Genetic factors determine the iris’s formation - the expression of human DNA decides iris’ biological form, color, and actual appearance. After the first eight months of human growth, one’s iris normally reaches sufficient size and enters a relatively stable period. The iris can remain unchanged for decades. This uniqueness and stability of the iris make it a strong foundation for identity verification.
Traditionally, most iris recognition algorithms run on PC platforms due to their sizable computational workload. We have ported our iris recognition algorithm to an embedded Arm platform on a single-core (1.2GHz) CPU. Feature extraction runs in 200ms, and the recognition of 5000 irises takes less than 200ms, such as RK3288 (Arm Cortex-A17).
We believe that every computer vision technology should be convenient, especially for biometric recognition use cases. Only by improving the safety and security of the application while providing satisfactory user experience, can computer vision technology be promoted more broadly throughout the world. Multimodal, touchless, and anti-spoofing is the primary trend of the future development of computer vision recognition technology. Based on these trends, we have released multiple hybrid recognition solutions including but not limited to:
[CTAToken URL = "https://www.zkteco.com" target="_blank" text="Vist ZKTeco" class ="green"]
If you have any questions, please do not hesitate to contact Elaine Lin on elaine.lin@zkteco.com.