2D/3D human pose estimation using deep convolutional neural nets

Download

index.pdf

Date

2019

Author

Kocabaş, Muhammed

Metadata

Show full item record

Item Usage Stats

436
views

260
downloads

In this thesis, we propose algorithms to estimate 2D/3D human pose from single view images. In the first part of the thesis, we present MultiPoseNet, a novel bottom-up multiperson pose estimation architecture that combines a multi-task model with a novel assignment method. MultiPoseNet can jointly handle person detection, keypoint detection, person segmentation and pose estimation problems. The novel assignment method is implemented by the Pose Residual Network (PRN) which receives keypoint and person detections, and produces accurate poses by assigning keypoints to person instances. On the COCO keypoints dataset, our pose estimation method outperforms all previous bottom-up methods both in accuracy (+4-point mAP over previous best result) and speed; it also performs on par with the best top-down methods while being at least 4x faster. Our method is the fastest real time system with _23 frames/sec. In the second part of the thesis, we present EpipolarPose which is a self-supervised training methodology for single person monocular human pose estimation and Pose Structure Score, a structure aware performance measure for 3D human pose estimation. Training accurate 3D human pose estimators requires large amount of 3D ground-truth data which is costly to collect. Various weakly or self supervised pose estimation methods have been proposed due to lack of 3D data. Nevertheless, these methods, in addition to 2D ground-truth poses, require either additional supervision in various forms (e.g. unpaired 3D ground truth data, a small subset of labels) or the camera parameters in multiview settings. To address these problems, we present EpipolarPose, a self-supervised learning method for 3D human pose estimation, which does not need any 3D ground-truth data or camera extrinsics. During training, EpipolarPose estimates 2D poses from multi-view images, and then, utilizes epipolar geometry to obtain a 3D pose and camera geometry which are subsequently used to train a 3D pose estimator. We demonstrate the effectiveness of our approach on standard benchmark datasets i.e. Human3.6M and MPI-INF-3DHP where we set the new state-of-the-art among weakly/self-supervised methods. Furthermore, we propose a new performance measure Pose Structure Score (PSS) which is a scale invariant, structure aware measure to evaluate the structural plausibility of a pose with respect to its ground truth.

Subject Keywords

Neural networks (Computer science)., Convolutions (Mathematics)., Three-dimensional imaging.

URI

http://etd.lib.metu.edu.tr/upload/12622955/index.pdf
https://hdl.handle.net/11511/27978

Collections

Graduate School of Natural and Applied Sciences, Thesis

Suggestions

OpenMETU
Core

Deep 3D semantic scene extrapolation Abbasi, Ali; Sahillioğlu, Yusuf; Kalkan, Sinan; Department of Computer Engineering (2018) In this thesis, we study the problem of 3D scene extrapolation with deep models. Scene extrapolation is a challenging variant of the scene completion problem, which pertains to predicting the missing part(s) of a scene. While the 3D scene completion algorithms in the literature try to fill the occluded part of a scene such as a chair behind a table, we focus on extrapolating the available half scene information to a full one, a problem that, to our knowledge, has not been studied yet. Our approaches are bas...
Parallel computing in linear mixed models Gökalp Yavuz, Fulya (Springer Science and Business Media LLC, 2020-09-01) In this study, we propose a parallel programming method for linear mixed models (LMM) generated from big data. A commonly used algorithm, expectation maximization (EM), is preferred for its use of maximum likelihood estimations, as the estimations are stable and simple. However, EM has a high computation cost. In our proposed method, we use a divide and recombine to split the data into smaller subsets, running the algorithm steps in parallel on multiple local cores and combining the results. The proposed me...
3D face recognition with local shape descriptors İnan, Tolga; Halıcı, Uğur; Department of Electrical and Electronics Engineering (2011) This thesis represents two approaches for three dimensional face recognition. In the first approach, a generic face model is fitted to human face. Local shape descriptors are located on the nodes of generic model mesh. Discriminative local shape descriptors on the nodes are selected and fed as input into the face recognition system. In the second approach, local shape descriptors which are uniformly distributed across the face are calculated. Among the calculated shape descriptors that are discriminative fo...
Supervised and unsupervised models of brain networks for brain decoding Alchihabi, Abdullah; Yarman Vural, Fatoş Tunay; Önal Ertuğrul, Itır; Department of Computer Engineering (2018) In this thesis, we propose computational network models for human brain. The models are estimated from fMRI measurements, recorded while subjects perform a set of cognitive tasks. We employ supervised and unsupervised machine learning techniques to represent high level cognitive tasks of human brain by dynamic networks. In the first part of this thesis, we propose an unsupervised multi-resolution brain network model. First, we decompose the signal into multiple sub-bands using Wavelet transform and estimate...
Video Shot Boundary Detection by Graph-theoretic Dominant Sets Approach Asan, Emrah; Alatan, Abdullah Aydın (2009-09-16) We present a video shot boundary detection algorithm based on the novel graph theoretic concept, namely dominant sets. Dominant sets are defined as a set of the nodes in a graph, mostly similar to each other and dissimilar to the others. In order to achieve this goal, candidate shot boundaries are determined by using simply pixelwise differences between consequent frames. For each candidate position, a testing sequence is constructed by considering 4 frames before the candidate position and 2 frames after t...

Citation Formats

M. Kocabaş, “2D/3D human pose estimation using deep convolutional neural nets,” M.S. - Master of Science, Middle East Technical University, 2019.