Journal «Language & Science» UTMN.


№8 2019. 05.00.00 ТЕХНИЧЕСКИЕ НАУКИ


About the authors:

Bezhenar Aleksandr Vasilevich, Bachelor student, University of Tyumen,
Sizova Lyudmila Vladimirovna, Senior lecturer, University of Tyumen,


Monocular head pose estimation requires learning a model that computes the intrinsic Euler angles for pose (yaw, pitch, roll) from an input image of human face. Annotating ground truth head pose angles for images in the wild is difficult and requires ad-hoc fitting procedures. This highlights the need for approaches which can train on data captured in controlled environment and generalize on the images in the wild (with varying appearance and illumination of the face). The authors of the article propose to use a higher level representation to regress the head pose while using deep learning architectures. More specifically, they use the uncertainty maps in the form of 2D soft localization heatmap images over five facial key points, namely left ear, right ear, left eye, right eye and nose, and pass them through a convolutional neural network to regress the head-pose. The authors show head pose estimation results on two challenging benchmarks BIWI and AFLW.


1. Varadarajan J., Subramanian R., Bulo S.R., Ahuja N., Lanz O., Ricci E. (2018) Joint estimation of human pose and conversational groups from social scenes. IJCV. pp. 410-429.

2. Wang K., Zhao R., Ji Q. (2018) Human computer interaction with head pose, eye gaze and body gestures. FG. p. 789.

3. Schwarz A., Haurilet M., Martinez M., Stiefelhagen R. (2017) Driveaheada large-scale driver head pose dataset. CVPRW. pp. 1-10.

4. Fanelli G., Weise T., Gall J., Gool L.V. (2011) Real time head pose estimation from consumer depth cameras. DAGM. pp. 617-624.

5. Lathuilire S., Juge R., Mesejo P., Munoz-Salinas R., Horaud R. (2017) Deep mixture of linear inverse regressions applied to head-pose estimation. CVPR. pp. 4817-4825.

6. Roth P.M., Koestinger M., Wohlhart P., Bischof H. (2011) Annotated Facial Landmarks in the Wild: A Large-scale, Real-world Database for Facial Landmark Localization. Proc. First IEEE International Workshop on Benchmarking Facial Image Analysis Technologies. Barcelona. DOI: 10.1109/ICCVW.2011.6130513 .

7. Wilson H.R. et al. (2000) Perception of head orientation. Vision Research, pp. 459-472.

8. Andriluka M., Pishchulin L., Gehler P., Bernt Schiele. (2014) 2d human pose estimation: New benchmark and state of the art analysis. CVPR. pp. 3686-3693.

9. Cao Z., Simon T., Wei S.E., Sheikh Y. (2017) Realtime multi-person 2d pose estimation using part affinity fields. CVPR. DOI: 10.1109/CVPR.2017.143.

10. Zhou X., Zhu M., Leonardos S., Derpanis K.G., Daniilidis K. (2016) Sparseness meets deepness: 3d human pose estimation from monocular video. CVPR. pp. 4966-4975.

11. Wu J., Xue T., Lim J.J., Tian Y., Tenenbaum J.B., Torralba A., Freeman W.T. (2016) Single image 3d interpreter network. ECCV. pp.365-382.

12. Liu X., Liang W, Wang Y., Li S., Pei M. (2016) 3d head pose estimation with convolutional neural network trained on synthetic images. ICIP. DOI:10.1109/ICIP.2016.7532566.

13. Ruiz N., Chong E., Rehg J.M. (2017) Finegrained head pose estimation without keypoints. CoRR. pp. 2074-2083.

14. Patacchiola M., Cangelosi A. (2017) Head pose estimation in the wild using convolutional neural networks and adaptive gradient methods. Pattern Recognition7. DOI: 10.1016/j.patcog.2017.06.009.

15. Kumar et al. (2017) Kepler: Keypoint and pose estimation of unconstrained faces by learning efficient h-cnn regressors. FG. DOI:10.1109/FG.2017.149