Abstract
Feature extractors significantly impact the performance of biometric systems. In the field of hand gesture authentication, existing studies focus on improving the model architectures and behavioral characteristic representation methods to enhance their feature extractors. However, loss functions, which can guide extractors to produce more discriminative identity features, are neglected. In this paper, we improve the margin-based Softmax loss functions, which are mainly designed for face authentication, in two aspects to form a new loss function for hand gesture authentication. First, we propose to replace the commonly used cosine function in the margin-based Softmax losses with a linear function to measure the similarity between identity features and proxies (the weight matrix of Softmax, which can be viewed as class centers). With the linear function, the main gradient magnitude decreases monotonically as the quality of the model improves during training, thus allowing the model to be quickly optimized in the early stage and precisely fine-tuned in the late stage. Second, we design an adaptive margin scheme to assign margin penalties to different samples according to their separability and the model quality in each iteration. Our adaptive margin scheme constrains the gradient magnitude. It can reduce radical (excessively large) gradient magnitudes and provide moderate (not too small) gradient magnitudes for model optimization, contributing to more stable training. The linear function and the adaptive margin scheme are complementary. Combining them, we obtain the proposed linear adaptive additive angular margin (L3AM) loss. To demonstrate the effectiveness of L3AM loss, we conduct extensive experiments on seven hand-related authentication datasets, compare it with 25 state-of-the-art (SOTA) loss functions, and apply it to eight SOTA hand gesture authentication models. The experimental results show that L3AM loss further improves the performance of the eight authentication models and outperforms the 25 losses. The code is available at https://github.com/SCUT-BIP-Lab/L3AM.
Similar content being viewed by others
Data Availibility
The used datasets are available in literature (Liu et al., 2020; Hao et al., 2007; Zhang et al., 2010, 2017). The proposed method can be downloaded from https://github.com/SCUT-BIP-Lab/L3AM.
References
Aumi, M. T. I., & Kratz, S. G. (2014). Airauth: Evaluating in-air hand gestures for authentication. In MobileHCI ’14.
Bai, Y., Zou, Q., Chen, X., Li, L., Ding, Z., & Chen, L. (2023). Extreme low-resolution action recognition with confident spatial–temporal attention transfer. International Journal of Computer Vision, 131(6), 1550–1565.
Bajaber, A., Fadel, M., & Elrefaei, L. A. (2022). Evaluation of deep learning models for person authentication based on touch gesture. Computer Systems Science and Engineering, 42, 465–481.
Boutros, F., Damer, N., Kirchbuchner, F., & Kuijper, A. (2022). Elasticface: Elastic margin loss for deep face recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1578–1587).
Carreira, J., & Zisserman, A. (2017). Quo vadis, action recognition? A new model and the kinetics dataset. In 2017 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 4724–4733).
Chan, F.K.-S., Li, X., & Kong, A.W.-K. (2017). A study of distinctiveness of skin texture for forensic applications through comparison with blood vessels. IEEE Transactions on Information Forensics and Security, 12, 1900–1915.
Chopra, S., Hadsell, R., & LeCun, Y. (2005). Learning a similarity metric discriminatively, with application to face verification. In 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05) (Vol. 1, pp. 539–5461).
Deng, J., Guo, J., Yang, J., Xue, N., Kotsia, I., & Zafeiriou, S. (2022). Arcface: Additive angular margin loss for deep face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10), 5962–5979.
Farnebäck, G. (2003). Two-frame motion estimation based on polynomial expansion. In Scandinavian conference on image analysis (pp. 363–370). Springer.
Ferrer, M. A., & Morales, A. (2011). Hand-shape biometrics combining the visible and short-wave infrared bands. IEEE Transactions on Information Forensics and Security, 6(4), 1305–1314.
Han, C., Shan, S., Kan, M., Wu, S., & Chen, X. (2022). Personalized convolution for face recognition. International Journal of Computer Vision, 1–19.
Hao, Y., Sun, Z., & Tan, T. (2007). Comparative studies on multispectral palm image fusion for biometrics. In Asian conference on computer vision. 130(2), 344–362.
He, L., Wang, Z., Li, Y., & Wang, S. (2020). Softmax dissection: Towards understanding intra-and inter-class objective for embedding learning. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, pp. 10957–10964).
Huang, Y., Wang, Y., Tai, Y., Liu, X., Shen, P., Li, S., Li, J., & Huang, F. (2020). Curricularface: Adaptive curriculum learning loss for deep face recognition. In 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 5900–5909).
Imura, S., & Hosobe, H. (2018). A hand gesture-based method for biometric authentication. In HCI.
Jiang, X., Liu, X., Fan, J., Ye, X., Dai, C., Clancy, E. A., & Chen, W. (2022). Measuring neuromuscular electrophysiological activities to decode hd-semg biometrics for cross-application discrepant personal identification with unknown identities. IEEE Transactions on Instrumentation and Measurement, 71, 1–15.
Jiang, X., Xu, K., Liu, X., Dai, C., Clifton, D. A., Clancy, E. A., Akay, M., & Chen, W. (2021). Neuromuscular password-based user authentication. IEEE Transactions on Industrial Informatics, 17, 2641–2652.
Jiao, J., Liu, W., Mo, Y., Jiao, J., Deng, Z.-L., & Chen, X. (2021). Dyn-arcface: Dynamic additive angular margin loss for deep face recognition. Multimedia Tools and Applications, 80, 25741–25756.
Li, Q., Luo, Z., & Zheng, J. (2022). A new deep anomaly detection-based method for user authentication using multichannel surface emg signals of hand gestures. IEEE Transactions on Instrumentation and Measurement, 71, 1–11.
Liu, C., Kang, W., Fang, L., & Liang, N. (2019). Authentication system design based on dynamic hand gesture. In CCBR.
Liu, C., Yang, Y., Liu, X., Fang, L., & Kang, W. (2020). Dynamic-hand-gesture authentication dataset and benchmark. IEEE Transactions on Information Forensics and Security, 16, 1550–1562.
Liu, H., Dai, L., Hou, S., Han, J., & Liu, H. (2019). Are mid-air dynamic gestures applicable to user identification? Pattern Recognition Letters, 117, 179–185.
Liu, H., Zhu, X., Lei, Z., Li, S. (2019). Adaptiveface: Adaptive margin and sampling for face recognition. In 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), 11939–11948.
Liu, F., Liu, G., Zhang, W., Wang, L., & Shen, L. (2022). A novel high-resolution fingerprint representation method. IEEE Transactions on Biometrics, Behavior, and Identity Science, 4(2), 289–300.
Liu, W., Lin, R., Liu, Z., Liu, L., Yu, Z., Dai, B., & Song, L. (2018). Learning towards minimum hyperspherical energy. Advances in Neural Information Processing Systems, 31, 1–12.
Liu, W., Wen, Y., Raj, B., Singh, R., & Weller, A. (2023). Sphereface revived: Unifying hyperspherical face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(2), 2458–2474.
Liu, W., Wen, Y., Yu, Z., Li, M., Raj, B., & Song, L. (2017). Sphereface: Deep hypersphere embedding for face recognition. In 2017 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 6738–6746).
Matkowski, W. M., Chai, T., & Kong, A. W. K. (2020). Palmprint recognition in uncontrolled and uncooperative environment. IEEE Transactions on Information Forensics and Security, 15, 1601–1615.
Meng, Q., Zhao, S., Huang, Z., & Zhou, F. (2021). Magface: A universal representation for face recognition and quality assessment. In 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 14220–14229).
Peng, G., Zhou, G., Nguyen, D. T., Qi, X., Yang, Q., & Wang, S. (2017). Continuous authentication with touch behavioral biometrics and voice on wearable glasses. IEEE Transactions on Human-Machine Systems, 47, 404–416.
Sae-Bae, N., Memon, N. D., Isbister, K., & Ahmed, K. (2014). Multitouch gesture-based authentication. IEEE Transactions on Information Forensics and Security, 9, 568–582.
Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. In 2015 IEEE Conference on computer vision and pattern recognition (CVPR) (pp. 815–823).
Shen, C., Wang, Z., Si, C., Chen, Y., & Su, X. (2020). Waving gesture analysis for user authentication in the mobile environment. IEEE Network, 34, 57–63.
Shirazi, A. S., Moghadam, P., Ketabdar, H., & Schmidt, A. (2012). Assessing the vulnerability of magnetic gestural authentication to video-based shoulder surfing attacks. In Proceedings of the SIGCHI conference on human factors in computing systems.
Simonyan, K., & Zisserman, A. (2014). Two-stream convolutional networks for action recognition in videos. In NIPS.
Song, W., Fang, L., Lin, Y., Zeng, M., & Kang, W. (2022). Dynamic hand gesture authentication based on improved two-stream cnn. In CCBR.
Song, W., & Kang, W. (2023). Depthwise temporal non-local network for faster and better dynamic hand gesture authentication. IEEE Transactions on Information Forensics and Security, 18, 1870–1883.
Song, W., Kang, W., & Lin, L. (2023). Hand gesture authentication by discovering fine-grained spatiotemporal identity characteristics. IEEE Transactions on Circuits and Systems for Video Technology, 34(1), 461–474.
Song, W., Kang, W., Wang, L., Lin, Z., & Gan, M. (2022). Video understanding-based random hand gesture authentication. IEEE Transactions on Biometrics, Behavior, and Identity Science, 4(4), 453–470.
Song, W., Kang, W., Yang, Y., Fang, L., Liu, C., & Liu, X. (2021). Tds-net: Towards fast dynamic random hand gesture authentication via temporal difference symbiotic neural network. In 2021 IEEE international joint conference on biometrics (IJCB) (pp. 1–8).
Song, W., Kang, W., & Zhang, Y. (2023). Understanding physiological and behavioral characteristics separately for high-performance video-based hand gesture authentication. IEEE Transactions on Instrumentation and Measurement, 72, 1–13.
Sun, J., Yang, W., Xue, J.-H., & Liao, Q. (2020). An equalized margin loss for face recognition. IEEE Transactions on Multimedia, 22, 2833–2843.
Sun, Y., Cheng, C., Zhang, Y., Zhang, C., Zheng, L., Wang, Z., & Wei, Y. (2020). Circle loss: A unified perspective of pair similarity optimization. In 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), (pp. 6397–6406).
Sun, Z., Wang, Y., Qu, G., & Zhou, Z. (2016). A 3-d hand gesture signature based biometric authentication system for smartphones. Security and Communication Networks, 9, 1359–1373.
Supančič, J. S., Rogez, G., Yang, Y., Shotton, J., & Ramanan, D. (2018). Depth-based hand pose estimation: Methods, data, and challenges. International Journal of Computer Vision, 126, 1180–1198.
Tolosana, R., Vera-Rodríguez, R., Fierrez, J., & Ortega-Garcia, J. (2021). Deepsign: Deep on-line signature verification. IEEE Transactions on Biometrics, Behavior, and Identity Science, 3, 229–239.
Venkat, I., & Wilde, P. D. (2010). Robust gait recognition by learning and exploiting sub-gait characteristics. International Journal of Computer Vision, 91, 7–23.
Wang, H., Chen, T., Liu, X., & Chen, J. (2020). Exploring the hand and finger-issued behaviors toward natural authentication. IEEE Access, 8, 55815–55825.
Wang, H., Wang, Y., Zhou, Z., Ji, X., Li, Z., Gong, D., Zhou, J., & Liu, W. (2018). Cosface: Large margin cosine loss for deep face recognition. In 2018 IEEE/CVF conference on computer vision and pattern recognition (pp. 5265–5274).
Wang, F., Xiang, X., Cheng, J., & Yuille, A. L. (2017). Normface: L2 hypersphere embedding for face verification. In Proceedings of the 25th ACM international conference on Multimedia.
Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., & Gool, L. V. (2019). Temporal segment networks for action recognition in videos. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41, 2740–2755.
Wang, P., Li, W., Ogunbona, P., Wan, J., & Escalera, S. (2018). Rgb-d-based human motion recognition with deep learning: A survey. Computer Vision and Image Understanding, 171, 118–139.
Wang, X., Zhang, S., Wang, S., Fu, T., Shi, H., & Mei, T. (2019). Mis-classified vector guided softmax loss for face recognition. In AAAI conference on artificial intelligence.
Wang, X., & Tanaka, J. (2018). Gesid: 3D gesture authentication based on depth camera and one-class classification. Sensors, 18, 3265.
Wen, Y., Liu, W., Weller, A., Raj, B., & Singh, R. (2022). Sphereface2: Binary classification is all you need for deep face recognition. In International conference on learning representations.
Wen, Y., Zhang, K., Li, Z., & Qiao, Y. (2016). A discriminative feature learning approach for deep face recognition. In European conference on computer vision.
Wong, A. M. H., Furukawa, M., & Maeda, T. (2020). Robustness of rhythmic-based dynamic hand gesture with surface electromyography (semg) for authentication. Electronics, 9, 2143.
Wong, A. M. H., & Kang, D.-K. (2016). Stationary hand gesture authentication using edit distance on finger pointing direction interval. Scientific Programming, 2016, 7427980–1742798015.
Wu, J., Christianson, J., Konrad, J., & Ishwar, P. (2015). Leveraging shape and depth in user authentication from in-air hand gestures. In 2015 IEEE international conference on image processing (ICIP) (pp. 3195–3199).
Wu, J., Ishwar, P., & Konrad, J. (2016). Two-stream cnns for gesture-based verification and identification: Learning user style. In 2016 IEEE conference on computer vision and pattern recognition workshops (CVPRW) (pp. 110–118).
Xiao, D., Li, J., Li, J., Dong, S., & Lu, T. (2022). Ihem loss: Intra-class hard example mining loss for robust face recognition. IEEE Transactions on Circuits and Systems for Video Technology, 32, 7821–7831.
Yu, X., Zhou, Z., Xu, M., You, X., & Li, X. (2020). Thumbup: Identification and authentication by smartwatch using simple hand gestures. In 2020 IEEE international conference on pervasive computing and communications (PerCom) (pp. 1–10).
Zhang, C., Zou, Y., Chen, G., & Gan, L. (2019). Pan: Persistent appearance network with an efficient motion cue for fast action recognition. In Proceedings of the 27th ACM international conference on multimedia.
Zhang, D., Guo, Z., Lu, G., Zhang, L., & Zuo, W. (2010). An online system of multispectral palmprint verification. IEEE Transactions on Instrumentation and Measurement, 59, 480–490.
Zhang, L., Li, L., Yang, A. J., Shen, Y., & Yang, M. (2017). Towards contactless palmprint recognition: A novel device, a new benchmark, and a collaborative representation based identification approach. Pattern Recognition, 69, 199–212.
Zhang, W., Chen, Y., Yang, W., Wang, G., Xue, J.-H., & Liao, Q. (2020). Class-variant margin normalized softmax loss for deep face recognition. IEEE Transactions on Neural Networks and Learning Systems, 32, 4742–4747.
Zhang, X., Zhao, R., Qiao, Y., Wang, X., & Li, H. (2019). Adacos: Adaptively scaling cosine logits for effectively learning deep face representations. In 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 10815–10824).
Zhang, Y., Herdade, S., Thadani, K., Dodds, E., Culpepper, J., & Ku, Y.-N. (2023). Unifying margin-based softmax losses in face recognition. In 2023 IEEE/CVF winter conference on applications of computer vision (WACV) (pp. 3537–3546).
Zhao, H., Shi, Y., Tong, X., Ying, X., & Zha, H. (2020). Qamface: Quadratic additive angular margin loss for face recognition. In 2020 IEEE international conference on image processing (ICIP) (pp. 1901–1905).
Zhao, K., Xu, J., & Cheng, M.-M. (2019). Regularface: Deep face recognition via exclusive regularization. In 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 1136–1144).
Zhong, Y., Deng, W., Hu, J., Zhao, D., Li, X., & Wen, D. (2021). Sface: Sigmoid-constrained hypersphere loss for robust face recognition. IEEE Transactions on Image Processing, 30, 2587–2598.
Acknowledgements
This work was partially supported by the National Natural Science Foundation of China under Grant Nos. 62376100 and 61976095, the Natural Science Foundation of Guangdong Province of China under Grant No. 2022A1515010114, and China Scholarship Council under Grant No. 202206150104.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Communicated by Segio Escalera.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Proofs
Proofs
1.1 Proof of Gradient Magnitudes
The gradients of \({\mathscr {L}}\) with respect to \({\varvec{W}}\) and \({\varvec{x}}\) in Eq. 2 are
where
\(\partial cos\theta _j / \partial {\varvec{W}}_j\) and \(\partial cos\theta _j / \partial {\varvec{x}}\) \((j=1,2,\ldots ,C)\) contain a portion of the gradient magnitude and the main gradient direction, which can be further calculated as
The gradient magnitudes of Eqs. A8 and A9 are
Hence, we can obtain the magnitudes of C gradients with respect to proxy \(\varvec{W_j}\) (\(j=1,2,\ldots ,C\)) and the magnitudes of C gradient components (associating with proxy \(\varvec{W_j}\)) with respect to \({\varvec{x}}\). These magnitudes can be divided into two types, associating with the corresponding proxy \({\varvec{W}}_{l}\) (\(cos\theta _{l}=\hat{{\varvec{W}}} _{l}\hat{{\varvec{x}}}\)) and non-corresponding proxies \({\varvec{W}}_{o}\) (\(cos\theta _{o}=\hat{{\varvec{W}}} _o\hat{{\varvec{x}}}\), \(o \ne l\)). The two type magnitudes of the gradients with respect to \({\varvec{W}}\) are
The two type magnitudes of the gradient components with respect to \({\varvec{x}}\) are
1.2 Proof of Linear Similarity Measurement Function
If we take \(\frac{\partial f(\theta _j)}{\partial cos\theta _j}=\frac{1}{sin\theta _j} (j=1,2,\ldots ,C\)), we get
Note that \(cos \theta _o\) is a spacial case of \(f(\theta _o)\) (\(m_0=m_1=1\) and \(m_2=m_3=0\), see Eq. 2). Thus, the modulation gradient magnitudes of the proposed L3AM loss (Eq. 15) are equal to one (\(M_{l}\)=\(M_{o}=1\)).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Song, W., Kang, W., Kong, A.WK. et al. L3AM: Linear Adaptive Additive Angular Margin Loss for Video-Based Hand Gesture Authentication. Int J Comput Vis (2024). https://doi.org/10.1007/s11263-024-02068-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11263-024-02068-w