Investigating Depth Domain Adaptation for Efficient Human Pose Estimation
A. Martinez-Gonzalez, M. Villamizar, O. Canevet and J-M. Odobez
European Conference on Computer Vision Workshop (ECCVW) - 2018

Abstract- Convolutional Neural Networks (CNN) are the leading models for human body landmark detection from RGB vision data. However, as such models require high computational load, an alternative is to rely on depth images which, due to their more simple nature, can allow the use of less complex CNNs and hence can lead to a faster detector. As learning CNNs from scratch requires large amounts of labeled data, which are not always available or expensive to obtain, we propose to rely on simulations and synthetic examples to build a large training dataset with precise labels. Nevertheless, the final performance on real data will suffer from the mismatch between the training and test data, also called domain shift between the source and target distributions. Thus in this paper, our main contribution is to investigate the use of unsupervised domain adaptation techniques to fill the gap in performance introduced by these distribution differences. The challenge lies in the important noise differences (not only gaussian noise, but many missing values around body limbs) between synthetic and real data, as well as the fact that we address a regression task rather than a classification one. In addition, we introduce a new public dataset of synthetically generated depth images to cover the cases of multi-person pose estimation. Our experiments show that domain adaptation provides some improvement, but that further network fine tuning with real annotated data is worth including to supervise the adaptation process.