Human pose and shape estimation based on masked mesh modeling from single view RGBD

Suat, Özhan
This thesis is aimed at the challenging task of estimating the 3D pose and shape of a human body from a single-view RGBD image. The primary motivation driving this research is to develop a robust method capable of accurately capturing human body shapes and articulations from limited visual cues. To address this objective, we propose a novel approach, integrating transformer-based models to complete partial information extracted from single-view RGBD data effectively. A fully supervised approach requires a dataset with RGBD image and 3D mesh label pairs. However, collecting such a dataset is costly and challenging, hence, existing datasets are limited in pose and shape diversity and small in size. To overcome this lack of data, we leverage MoCap datasets to train our network. Our approach involves creating pairs of “partial” point clouds and 3D human body meshes by utilizing body models from MoCap datasets. A partial point cloud simulates the type of depth data that a RGBD camera provides from a single viewpoint. We train our model using these generated pairs. During testing, our method uses 2D visual cues to find correspondence between 3D points generated from RGBD and vertices from the 3D human body mesh surface. To achieve this, we utilize an off-the-shelf 2D UV map estimator to generate a UV map from an RGB image. By mapping UV map values to 3D human body model UV values, we locate body model vertices in the 2D image. The 2D vertex locations are then lifted to 3D space using the depth channel information. The key contribution of our method is using transformers to fill in missing details in the human body model, similar to efforts in masked image modeling. Our method effectively recovers parts of the 3D human body mesh model that were not visible, resulting in a full body mesh. Our method achieves 40.64 and 83.59 PVE and 37.36, 68.15 MPJPE errors on the 3DPW and BEHAVE datasets respectively, validating the effectiveness of our approach.
Citation Formats
Ö. Suat, “Human pose and shape estimation based on masked mesh modeling from single view RGBD,” M.S. - Master of Science, Middle East Technical University, 2023.