3-D structure assisted reference view generation for H.264 based multi-view video coding

2007-06-13
Gedik, O. Serdar
Oezkalayci, Burak
Alatan, Abdullah Aydın
A 3D geometry-based multi-view video coding (MVC) method is proposed. In order to utilize the spatial redundancies between multiple views, the scene geometry is estimated as dense depth maps. The dense depth estimation problem is modeled by using a Markov random field (MRF) and solved via the belief propagation algorithm. Relying on these depth maps of the scene, novel view estimates of the intermediate views of the multi-view set is obtained with a 3D warping algorithm, which also performs hole-filling in the occlusion regions. The proposed MVC method, based on H.264 standard, encodes a number of reference views in a standard manner, whereas the residuals of the novel view predictions are encoded separately. The proposed MVC method is compared against the well-known JMVM compression algorithm, yielding competitive performances, while additionally providing 3D structure information of the observed scene.
IEEE 15th Signal Processing and Communications Applications Conference

Suggestions

Multi-view video coding via dense depth estimation
Oezkalayci, Burak; Gedik, O. Serdar; Alatan, Abdullah Aydın (2007-05-09)
A geometry-based multi-view video coding (MVC) method is proposed. In order to utilize the spatial redundancies between multiple views, the scene geometry is estimated as dense depth maps. The dense depth estimation problem is modeled by using a Markov random field (MRF) and solved via the belief propagation algorithm. Relying on these depth maps of the scene, novel view estimates of the intermediate views of the multi-view set is obtained with a 3D warping algorithm, which also performs hole-filling in the...
Relaxed Spatio-Temporal Deep Feature Aggregation for Real-Fake Expression Prediction
Ozkan, Savas; Akar, Gözde (2017-10-29)
Frame-level visual features are generally aggregated in time with the techniques such as LSTM, Fisher Vectors, NetVLAD etc. to produce a robust video-level representation. We here introduce a learnable aggregation technique whose primary objective is to retain short-time temporal structure between frame-level features and their spatial interdependencies in the representation. Also, it can be easily adapted to the cases where there have very scarce training samples. We evaluate the method on a real-fake expr...
Multi-camera video surveillance : detection, occlusion handling, tracking and event recognition
Akman, Oytun; Alatan, Abdullah Aydın; Department of Electrical and Electronics Engineering (2007)
In this thesis, novel methods for background modeling, tracking, occlusion handling and event recognition via multi-camera configurations are presented. As the initial step, building blocks of typical single camera surveillance systems that are moving object detection, tracking and event recognition, are discussed and various widely accepted methods for these building blocks are tested to asses on their performance. Next, for the multi-camera surveillance systems, background modeling, occlusion handling, tr...
A hierarchical object localization and image retrieval framework
Uysal, Mutlu; Yarman Vural, Fatoş Tunay; Department of Computer Engineering (2006)
This thesis proposes an object localization and image retrieval framework, which trains a discriminative feature set for each object class. For this purpose, a hierarchical learning architecture, together with a Neighborhood Tree is introduced for object labeling. Initially, a large variety of features are extracted from the regions of the pre-segmented images. These features are, then, fed to the training module, which selects the "best set of representative features", suppressing relatively less important...
Multiple description coding of animated meshes
Bici, M. Oguz; Akar, Gözde (Elsevier BV, 2010-11-01)
In this paper, we propose three novel multiple description coding (MDC) methods for reliable transmission of compressed animated meshes represented by series of 3D static meshes with same connectivity. The proposed methods trade off reconstruction quality for error resilience to provide the best expected reconstruction of 3D mesh sequence at the decoder side. The methods are based on layer duplication and partitioning of the set of vertices of a scalable coded animated mesh by either spatial or temporal sub...
Citation Formats
O. S. Gedik, B. Oezkalayci, and A. A. Alatan, “3-D structure assisted reference view generation for H.264 based multi-view video coding,” presented at the IEEE 15th Signal Processing and Communications Applications Conference, Eskisehir, TURKEY, 2007, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/55476.