Self-Supervised Learning of 3D Human Pose using Multi-view Geometry

Date

2019-01-01

Author

Kocabas, Muhammed
Karagoz, Salih
Akbaş, Emre

Metadata

Show full item record

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Item Usage Stats

174
views

0
downloads

Training accurate 3D human pose estimators requires large amount of 3D ground-truth data which is costly to collect. Various weakly or self supervised pose estimation methods have been proposed due to lack of 3D data. Nevertheless, these methods, in addition to 2D ground-truth poses, require either additional supervision in various forms (e.g. unpaired 3D ground truth data, a small subset of labels) or the camera parameters in multiview settings. To address these problems, we present EpipolarPose, a self-supervised learning method for 3D human pose estimation, which does not need any 3D ground-truth data or camera extrinsics. During training, EpipolarPose estimates 2D poses from multi-view images, and then, utilizes epipolar geometry to obtain a 3D pose and camera geometry which are subsequently used to train a 3D pose estimator. We demonstrate the effectiveness of our approach on standard benchmark datasets (i.e. Human3.6M and MPI-INF-3DHP) where we set the new state-of-the-art among weakly/self-supervised methods. Furthermore, we propose a new performance measure Pose Structure Score (PSS) which is a scale invariant, structure aware measure to evaluate the structural plausibility of a pose with respect to its ground truth. Code and pretrained models are available at http://github.com/mkocabas/Epipolarpose

Subject Keywords

Face, Gesture and body pose, 3D from single image

URI

https://hdl.handle.net/11511/41853

DOI

https://doi.org/10.1109/cvpr.2019.00117

Collections

Department of Computer Engineering, Conference / Seminar

Suggestions

OpenMETU
Core

Frankenstein3d: human body reconstruction from limited number of points Taştan, Oğuzhan; Sahillioğlu, Yusuf; Department of Computer Engineering (2019) We propose a novel approach for reconstructing high-resolution 3D human body models from extremely small number of 3D points which represent body parts. We leverage a database of high-resolution 3D models of 100 humans varying from each other by physical attributes such as age, weight, size etc. We, first, divide the bodies in database into seven semantic regions. Then, for each input region consisting of maximum 40 points, we search the database for the best matching body part. For the matching criterion, ...
3D face modeling using multiple images BUYUKATALAY, SONER; Halıcı, Uğur; AKAGUNDUZ, ERDEM; ULUSOY PARNAS, İLKAY (2006-04-19) 3D face modeling based on real images is one of the important subject of Computer Vision that is studied recently. In this paper the study that eve contucted in our Computer Vision and Intelligent Systems Research Laboratory on 3D face model generation using uncalibrated multiple still images is explained.
3D Face Reconstruction Using Stereo Images and Structured Light OZTURK, Ahmet Oguz; Halıcı, Uğur; ULUSOY PARNAS, İLKAY; AKAGUNDUZ, Erdem (2008-04-22) In this paper, the 3D face scanner that we developed using stereo cameras and structured light together is presented. Structured light having a pattern of vertical lines is used to create feature points and to match them easily. 3D point cloud obtained by stereo analysis is post processed to obtain the 3D model in obj format.
2D-3D feature association via projective transform invariants for model-based 3D pose estimation Gedik, O. Serdar; Alatan, Abdullah Aydın (2012-01-26) The three dimensional (3D) tracking of rigid objects is required in many applications, such as 3D television (3DTV) and augmented reality. Accurate and robust pose estimates enable improved structure reconstructions for 3DTV and reduce jitter in augmented reality scenarios. On the other hand, reliable 2D-3D feature association is one of the most crucial requirements for obtaining high quality 3D pose estimates. In this paper, a 2D-3D registration method, which is based on projective transform invariants, is...
Automatic segmentation of cristae membranes in 3d electron microscopy tomography images using artificial neural networks Karadeniz, Merih Alphan; Mumcuoğlu, Ünal Erkan; Department of Medical Informatics (2016) Electron Microscopy Tomography (EMT) technique produces 3D images of cells comprising hundreds of slices of high resolution frames. Segmentation of membranes in these images are necessary in order to reveal the relations between the structural components of the cell and its behaviour. The physical shape of the crista which is a membrane of the mitochondria has been hypostatized for being an early indicator for many diseases or mitochondrial dysfunctions. Automatic segmentation of cristae in EMT images are n...

Citation Formats

M. Kocabas, S. Karagoz, and E. Akbaş, “Self-Supervised Learning of 3D Human Pose using Multi-view Geometry,” 2019, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/41853.