Apparent Age Estimation from Face Images Combining General and Children-Specialized Deep Learning Models

Orange Labs team: Grigory Antipov1,2, Moez Baccouche1, Sid-Ahmed Berrani1, Jean-Luc Dugelay2
1Orange Labs, 4 rue Clos Courtel, 35512 Cesson-Sévigné, France
2Eurecom, 450 route des Chappes, 06410 Biot, France


ChaLearn LAP and FotW Challenge and Workshop @ CVPR2016
Winner of Track 1: Apparent Age Estimation
Best Paper Award of the Workshop

Abstract

This work describes our solution in the second edition of the ChaLearn LAP competition on Apparent Age Estimation [1]. Starting from a pretrained version of the VGG-16 convolutional neural network (CNN) for face recognition, we train it on the huge IMDB-Wiki dataset for biological age estimation and then fine-tune it for apparent age estimation using the relatively small competition dataset. We show that the precise age estimation of children is the cornerstone of the competition. Therefore, we integrate a separate children VGG-16 network for apparent age estimation of children between 0 and 12 years old in our final solution. The children network is fine-tuned from the general one. We employ different age encoding strategies for training general and children networks: the soft one (label distribution encoding) for the general network and the strict one (0/1 classification encoding) for the children network. Finally, we highlight the importance of the state-of-the-art face detection and face alignment for the final apparent age estimation. Our resulting solution wins the 1st place in the competition significantly outperforming the runner-up.

Overview of the solution

Our solution is inspired by the solution of the winners [2] of the first edition of the ChaLearn LAP competition on Apparent Age Estimation. We also use the Head Hunter algorithm [3] for face detection and the VGG-16 CNN architecture [4] for age estimation. Following [2], we firstly train VGG-16 CNNs for biological age estimation on the IMDB-Wiki dataset and then fine-tune them for apparent age estimation using the competition data.

We further improve the previous year's solution [2] by proposing the following novelties:

  1. A separate age estimation model for images of children between 0 and 12 years old.
  2. Combining age encoding strategies: label distribution encoding [5] for the general model and 0/1 classification encoding for the children model.
  3. Integrating the state-of-the-art solution [6] for face alignment.

The full training and test pipelines of our solution are presented in Figures 1 and 2, respectively.

Figure 1: Training pipeline.

train_pipe


Figure 2: Test pipeline.

test_pipe

For more details, please, consult our paper.

References

[1] S. Escalera, M. Torres, B. Martinez, X. Baro, H. J. Escalante et al., ChaLearn Looking at People and Faces of the World: Face Analysis Workshop and Challenge 2016, Computer Vision and Pattern Recognition Workshop, 2016.
[2] R. Rothe, R. Timofte, L. Van Gool. DEX: Deep EXpectation of apparent age from a single image, International Conference on Computer Vision ChaLearn Looking at People Workshop, 2015.
[3] M. Mathias, R. Benenson, M. Pedersoli, L. Van Gool. Face detection without bells and whistles, European Conference on Computer Vision, 2014.
[4] K. Simonyan, A. Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition, arXiv: abs/1409.1556, 2014.
[5] X. Geng, C. Yin, Z. Zhou. Facial age estimation by learning from label distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013.
[6] M. Uricar, V. Franc, D. Thomas, A. Sugimoto and V. Hlavac. Real-time multi-view facial landmark detector learned by the structured output SVM, International Conference on Automatic Face and Gesture Recognition, 2015.