Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Human pose and shape estimation based on masked mesh modeling from single view RGBD
Download
index.pdf
Date
2023-9-11
Author
Suat, Özhan
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
52
views
0
downloads
Cite This
This thesis is aimed at the challenging task of estimating the 3D pose and shape of a human body from a single-view RGBD image. The primary motivation driving this research is to develop a robust method capable of accurately capturing human body shapes and articulations from limited visual cues. To address this objective, we propose a novel approach, integrating transformer-based models to complete partial information extracted from single-view RGBD data effectively. A fully supervised approach requires a dataset with RGBD image and 3D mesh label pairs. However, collecting such a dataset is costly and challenging, hence, existing datasets are limited in pose and shape diversity and small in size. To overcome this lack of data, we leverage MoCap datasets to train our network. Our approach involves creating pairs of “partial” point clouds and 3D human body meshes by utilizing body models from MoCap datasets. A partial point cloud simulates the type of depth data that a RGBD camera provides from a single viewpoint. We train our model using these generated pairs. During testing, our method uses 2D visual cues to find correspondence between 3D points generated from RGBD and vertices from the 3D human body mesh surface. To achieve this, we utilize an off-the-shelf 2D UV map estimator to generate a UV map from an RGB image. By mapping UV map values to 3D human body model UV values, we locate body model vertices in the 2D image. The 2D vertex locations are then lifted to 3D space using the depth channel information. The key contribution of our method is using transformers to fill in missing details in the human body model, similar to efforts in masked image modeling. Our method effectively recovers parts of the 3D human body mesh model that were not visible, resulting in a full body mesh. Our method achieves 40.64 and 83.59 PVE and 37.36, 68.15 MPJPE errors on the 3DPW and BEHAVE datasets respectively, validating the effectiveness of our approach.
Subject Keywords
RGBD
,
3D human body pose estimation
,
Transformer
URI
https://hdl.handle.net/11511/105368
Collections
Graduate School of Natural and Applied Sciences, Thesis
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
Ö. Suat, “Human pose and shape estimation based on masked mesh modeling from single view RGBD,” M.S. - Master of Science, Middle East Technical University, 2023.