ATTENTION-BASED OBJECT PLACEMENT FOR IMAGE COMPOSITING

2023-8-11
Çağlar, Akif
Image composition is one of the generative tasks of computer vision, that finds applications in fields such as synthetic data generation and advertising. It can be defined as constructing realistic novel images from given image components. Image composition encompasses several subtasks, including harmonization, object placement, and shadow generation. Object placement is the task of placing a given foreground object onto a given background in a logical manner, taking into account factors such as object size, supporting ground, and occlusion. Object placement literature has evolved alongside the advancements in neural network technologies. With these advancements, convolutional neural networks (CNNs), generative adversarial networks (GANs), and transformers have become the key components of state-of-the-art approaches. Among these, transformer-embodying approaches give the best performance, due to the help of attention mechanisms. These attention-based approaches in the field follow two paths: one involves regressing the transformation vector by utilizing cross-attention, while the other produces a placement-rationality heatmap to find the best position to place the object, benefiting from self-attention. However, a comprehensive performance analysis between these two approaches is lacking in the literature. In this work, we examine and compare the performances of these two models to provide insight into the next steps for the field. Additionally, we explore the effect of providing foreground object class encodings to attention-based object placement methods. Furthermore, we have analyzed placement extraction procedures from placement-rationality heatmap output and proposed a new procedure. Results show that employing a cross-attention mechanism while regressing the transformation vector is the superior design choice for object placement neural models. Furthermore, findings indicate that providing class encodings to models benefits the architecture with a more complex attention mechanism better. Lastly, it is also seen from the results that the proposed placement extraction procedure is more effective than the one employed in the literature.
Citation Formats
A. Çağlar, “ATTENTION-BASED OBJECT PLACEMENT FOR IMAGE COMPOSITING,” M.S. - Master of Science, Middle East Technical University, 2023.