Estimation of partially occluded human joints using a bayesian approach and an application of human image inpainting

Download
2021-2-01
Dursun, Ahmet Anıl
Human pose estimation is a well-known computer vision task that has applications in the fields of surveillance, computerized outfit planners, and video special effects. State-of-the-art pose estimators are based on CNN structures and use visual features obtained from single images. However, occlusions are prevalent problems in the natural scenarios for this task, and the performances of CNN-based estimators degrade significantly under occlusion conditions. In this thesis, a novel Bayesian approach, BAKE: Bayesian Approach for occluded Keypoint Estimation, is presented to estimate the positions of occluded human joints in a given video sequence. This approach uses a well-known CNN-based pose estimator Openpose [1] to detect the visible human joints in a given image and then develops a Bayesian framework to complete missing pose elements. This approach can be evaluated as an alternative for 3D CNN structures in terms of embedding the information in time-dependent event sequences. In our case, the problem is ill-conditioned since it is in general not possible to complete the missing joints for an arbitrary occlusion on the human body accurately. However, it is possible to localize some missing joints in certain regions based on the apriori information on the human skeleton with a certain confidence. This apriori information is obtained from the previous frames of the video sequence. A statistical human body model is generated by defining the joint length and angle parameters. The parameters of the model are calculated from the non-occluded frames of the video sequence as the local information as well as a human pose database is utilized for obtaining the general joint statistics. Then, on a partially occluded video frame, body length and angle distribution are updated by using the visible joints. These length and angle distributions are used for the estimation of occluded joints. A new confidence score is also proposed. This confidence score is used to develop a hybrid technique which combines the predictions of the Openpose and the proposed method, BAKE. Several experiments are performed to compare the outputs of the Openpose, BAKE, and the hybrid approach. It is shown that BAKE outperforms Openpose in general and the hybrid method generates a slight improvement over the BAKE. In addition to the BAKE method, an inpainting method for the partially occluded human video frames is proposed. In this method, non-occluded images of the target person obtained from the video sequence are used with a 3D body reconstruction algorithm, SMPLify-x [2]. Image patches are transferred from non-occluded images to occluded parts after a matching process and the corresponding results are shown.

Suggestions

Alignment of uncalibrated images for multi-view classification
Arık, Sercan Ömer; Vural, Elif; Frossard, Pascal (2011-12-29)
Efficient solutions for the classification of multi-view images can be built on graph-based algorithms when little information is known about the scene or cameras. Such methods typically require a pairwise similarity measure between images, where a common choice is the Euclidean distance. However, the accuracy of the Euclidean distance as a similarity measure is restricted to cases where images are captured from nearby viewpoints. In settings with large transformations and viewpoint changes, alignment of im...
Face Recognition Based on Embedding Learning
Karaman, Kaan; Koc, Aykut; Alatan, Abdullah Aydın (2018-09-11)
Face recognition is a key task of computer vision research that has been employed in various security and surveillance applications. Recently, the importance of this task has risen with the improvements in the quality of sensors of cameras, as well as with the increasing coverage of camera networks setup everywhere in the cities. Moreover, biometry-based technologies have been developed for the last three decades and have been available on many devices such as the mobile phones. The goal is to identify peop...
Attentive deep regression networks for real-time visual face tracking in video surveillance
Alver, Safa; Halıcı, Uğur; Department of Electrical and Electronics Engineering (2019)
Visual face tracking is one of the most important tasks in video surveillance systems. However, due to the variations in pose, scale, expression and illumination and the occlusions in cluttered scenes, it is considered to be a difficult task. To address these challenges, in this thesis, we propose an end-to-end tracker named Attentive Face Tracking Network (AFTN) that is build on top of the GOTURN tracker. Additionally, to overcome the scarce data problem in visual face tracking, we also provide bounding bo...
Predicting the Location and Time of Mobile Phone Users by Using Sequential Pattern Mining Techniques
Ozer, Mert; Keles, Ilkcan; Toroslu, Hakki; Karagöz, Pınar; Davulcu, Hasan (2016-06-01)
In recent years, using cell phone log data to model human mobility patterns became an active research area. This problem is a challenging data mining problem due to huge size and non-uniformity of the log data, which introduces several granularity levels for the specification of temporal and spatial dimensions. This paper focuses on the prediction of the location of the next activity of the mobile phone users. There are several versions of this problem. In this work, we have concentrated on the following th...
Performance evaluation of similarity measures for dense multimodal stereovision
Yaman, Mustafa; Kalkan, Sinan (SPIE-Intl Soc Optical Eng, 2016-05-01)
Multimodal imaging systems have recently been drawing attention in fields such as medical imaging, remote sensing, and video surveillance systems. In such systems, estimating depth has become possible due to the promising progress of multimodal matching techniques. We perform a systematic performance evaluation of similarity measures frequently used in the literature for dense multimodal stereovision. The evaluated measures include mutual information (MI), sum of squared distances, normalized cross-correlat...
Citation Formats
A. A. Dursun, “Estimation of partially occluded human joints using a bayesian approach and an application of human image inpainting,” M.S. - Master of Science, Middle East Technical University, 2021.