Deep 3D semantic scene extrapolation

Abbasi, Ali
In this thesis, we study the problem of 3D scene extrapolation with deep models. Scene extrapolation is a challenging variant of the scene completion problem, which pertains to predicting the missing part(s) of a scene. While the 3D scene completion algorithms in the literature try to fill the occluded part of a scene such as a chair behind a table, we focus on extrapolating the available half scene information to a full one, a problem that, to our knowledge, has not been studied yet. Our approaches are based on deep generative adversarial (GAN) and convolutional neural networks (CNN). As input, we take the half of 3D voxelized scenes, then our models complete the other half of scenes as output. Our baseline CNN model consists of convolutional and ReLU layers with multiple residual connections and Softmax classifier with voxel-wise cross-entropy loss function at the end. We use the baseline CNN model as the generator network in the proposed GAN model. We regularize our GAN model with a discriminator network, consisting of two internal, local-global networks to have sharper and more realistic results. Local discriminator takes only the generated part of scenes, while global discriminator network takes not only the generated part but also the first real part to distinguish between real and fake scenes. Using the CNN model we also propose a hybrid model, which takes the top view projection of input scene in parallel with the 3D input. We train and evaluate our models on the synthetic 3D SUNCG dataset. We show that our trained networks can predict the other half of the scenes, and complete the objects correctly with suitable lengths. With a discussion on the challenges, we propose scene extrapolation as a challenging testbed for future research in deep learning.
Citation Formats
A. Abbasi, “Deep 3D semantic scene extrapolation,” M.S. - Master of Science, Middle East Technical University, 2018.