Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
ATTENTION-BASED OBJECT PLACEMENT FOR IMAGE COMPOSITING
Download
Akif_Çağlar_Thesis.pdf
Date
2023-8-11
Author
Çağlar, Akif
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
73
views
124
downloads
Cite This
Image composition is one of the generative tasks of computer vision, that finds applications in fields such as synthetic data generation and advertising. It can be defined as constructing realistic novel images from given image components. Image composition encompasses several subtasks, including harmonization, object placement, and shadow generation. Object placement is the task of placing a given foreground object onto a given background in a logical manner, taking into account factors such as object size, supporting ground, and occlusion. Object placement literature has evolved alongside the advancements in neural network technologies. With these advancements, convolutional neural networks (CNNs), generative adversarial networks (GANs), and transformers have become the key components of state-of-the-art approaches. Among these, transformer-embodying approaches give the best performance, due to the help of attention mechanisms. These attention-based approaches in the field follow two paths: one involves regressing the transformation vector by utilizing cross-attention, while the other produces a placement-rationality heatmap to find the best position to place the object, benefiting from self-attention. However, a comprehensive performance analysis between these two approaches is lacking in the literature. In this work, we examine and compare the performances of these two models to provide insight into the next steps for the field. Additionally, we explore the effect of providing foreground object class encodings to attention-based object placement methods. Furthermore, we have analyzed placement extraction procedures from placement-rationality heatmap output and proposed a new procedure. Results show that employing a cross-attention mechanism while regressing the transformation vector is the superior design choice for object placement neural models. Furthermore, findings indicate that providing class encodings to models benefits the architecture with a more complex attention mechanism better. Lastly, it is also seen from the results that the proposed placement extraction procedure is more effective than the one employed in the literature.
Subject Keywords
image compositing
,
object placement
,
attention
,
class encoding
URI
https://hdl.handle.net/11511/105222
Collections
Graduate School of Natural and Applied Sciences, Thesis
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
A. Çağlar, “ATTENTION-BASED OBJECT PLACEMENT FOR IMAGE COMPOSITING,” M.S. - Master of Science, Middle East Technical University, 2023.