Deep 3D semantic scene extrapolation

2019-02-01
Scene extrapolation is a challenging variant of the scene completion problem, which pertains to predicting the missing part(s) of a scene. While the 3D scene completion algorithms in the literature try to fill the occluded part of a scene such as a chair behind a table, we focus on extrapolating the available half-scene information to a full one, a problem that, to our knowledge, has not been studied yet. Our approaches are based on convolutional neural networks (CNN). As input, we take the half of 3D voxelized scenes, then our models complete the other half of scenes as output. Our baseline CNN model consisting of convolutional and ReLU layers with multiple residual connections and Softmax classifier with voxel-wise cross-entropy loss function at the end. We train and evaluate our models on the synthetic 3D SUNCG dataset. We show that our trained networks can predict the other half of the scenes and complete the objects correctly with suitable lengths. With a discussion on the challenges, we propose scene extrapolation as a challenging test bed for future research in deep learning. We made our models available on https://github.com/aliabbasi/d3dsse.
VISUAL COMPUTER

Suggestions

Texture and edge preserving multiframe super-resolution
Turgay, Emre; Akar, Gözde (Institution of Engineering and Technology (IET), 2014-09-01)
Super-resolution (SR) image reconstruction refers to methods where a higher resolution image is reconstructed using a set of overlapping aliased low-resolution observations of the same scene. Although edge preservation has been a widely explored topic in SR literature, texture-specific regularisation has recently gained interest. In this study, texture-specific regularisation is handled as a post-processing step. A two stage method is proposed, comprising multiple SR reconstructions with different regularis...
Discretization of Parametrizable Signal Manifolds
Vural, Elif (Institute of Electrical and Electronics Engineers (IEEE), 2011-12-01)
Transformation-invariant analysis of signals often requires the computation of the distance from a test pattern to a transformation manifold. In particular, the estimation of the distances between a transformed query signal and several transformation manifolds representing different classes provides essential information for the classification of the signal. In many applications, the computation of the exact distance to the manifold is costly, whereas an efficient practical solution is the approximation of ...
3D indirect shape retrieval based on hand interaction
Irmak, Erdem Can; Sahillioğlu, Yusuf (Springer Science and Business Media LLC, 2020-01-01)
In this work, we present a novel 3D indirect shape analysis method which successfully retrieves 3D shapes based on hand-object interaction. To this end, the human hand information is first transferred to the virtual environment by the Leap Motion controller. Position-, angle- and intersection-based novel features of the hand and fingers are used for this part. In the guidance of these features that define the way humans grab objects, a support vector machine (SVM) classifier is trained. Experiments validate...
Relative consistency of projective reconstructions obtained from the same image pair
Otlu, Burcak; Atalay, Mustafa Ümit; Hassanpour, Reza (World Scientific Pub Co Pte Lt, 2006-08-01)
This study obtains projective reconstructions of an object or a scene from its image pair and measures relative consistency of these projective reconstructions. 3D points are estimated from an image pair using projective and epipolar geometry. Two measures are presented for verification of projective reconstructions with each other. These measures are based on the equality of ratios between the x-, y- and z-coordinates of 3D reconstructed points which are obtained from the same corresponding points. This in...
Disparity disambiguation by fusion of signal- and symbolic-level information
Ralli, Jarno; Diaz, Javier; Kalkan, Sinan; Krueger, Norbert; Ros, Eduardo (Springer Science and Business Media LLC, 2012-01-01)
We describe a method for resolving ambiguities in low-level disparity calculations in a stereo-vision scheme by using a recurrent mechanism that we call signal-symbol loop. Due to the local nature of low-level processing it is not always possible to estimate the correct disparity values produced at this level. Symbolic abstraction of the signal produces robust, high confidence, multimodal image features which can be used to interpret the scene more accurately and therefore disambiguate low-level interpretat...
Citation Formats
A. Abbasi, S. Kalkan, and Y. Sahillioğlu, “Deep 3D semantic scene extrapolation,” VISUAL COMPUTER, pp. 271–279, 2019, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/44875.