Improved Image Generation in Normalizing Flows through a Multi-Scale Architecture and Variational Training

Download

MScTh_Deniz_Sayin.pdf

Date

2022-8-31

Author

Sayın, Deniz

Metadata

Show full item record

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Item Usage Stats

404
views

297
downloads

Generative models have been shown to be able to produce very high fidelity samples in natural image generation tasks in recent years, especially using generative adverserial network and denoising diffusion model based approaches. Normalizing flow models are another class of generative models, which are based on learning invertible mappings between the latent space and the image space. Normalizing flow models possess desirable features such as the ability to perform exact density estimation and simple maximum likelihood based training, which can offer theoretical guarantees. While the state-of-the-art normalizing flow models are able to produce high fidelity images on specific simple image generation tasks such as faces and bedrooms, they typically fail to produce sensible results in difficult natural image datasets containing a multitude of underlying classes. We propose an approach focused on improving natural image generation using a new normalizing flow model, in which we start by generating a small natural image and refine it step by step with conditional normalizing flow models performing 2x super-resolution. We also propose a new augmentation method at the feature level for conditional encodings to make the intermediate models in our cascade more robust against noise and artifacts coming previous levels of the cascade. This augmentation method has its roots in variational inference. We perform experiments on the CelebA and CIFAR-10 datasets, show our qualitative results and compare our generations with state-of-the-art approaches using the FID metric.

Subject Keywords

generative models, natural image generation, normalizing flows, variational inference

URI

https://hdl.handle.net/11511/99466

Collections

Graduate School of Natural and Applied Sciences, Thesis

Suggestions

OpenMETU
Core

Paired 3D Model Generation with Conditional Generative Adversarial Networks Öngün, Cihan; Temizel, Alptekin (2018-09-14) Generative Adversarial Networks (GANs) are shown to be successful at generating new and realistic samples including 3D object models. Conditional GAN, a variant of GANs, allows generating samples in given conditions. However, objects generated for each condition are different and it does not allow generation of the same object in different conditions. In this paper, we first adapt conditional GAN, which is originally designed for 2D image generation, to the problem of generating 3D models in different rotat...
Improving classification performance of endoscopic images with generative data augmentation Çağlar, Ümit Mert; Temizel, Alptekin; Department of Modeling and Simulation (2022-2-8) The performance of a supervised deep learning model is highly dependent on the quality and variety of the images in the training dataset. In some applications, it may be impossible to obtain more images. Data augmentation methods have been proven to be successful in increasing the performance of deep learning models with limited data. Recent improvements on Generative Adversarial Networks (GAN) algorithms and structures resulted in improved image quality and diversity and made GAN training possible with lim...
Estimation of Articulatory Trajectories Based on Gaussian Mixture Model (GMM) With Audio-Visual Information Fusion and Dynamic Kalman Smoothing ÖZBEK, İbrahim Yücel; Hasegawa-Johnson, Mark; Demirekler, Mübeccel (Institute of Electrical and Electronics Engineers (IEEE), 2011-07-01) This paper presents a detailed framework for Gaussian mixture model (GMM)-based articulatory inversion equipped with special postprocessing smoothers, and with the capability to perform audio-visual information fusion. The effects of different acoustic features on the GMM inversion performance are investigated and it is shown that the integration of various types of acoustic (and visual) features improves the performance of the articulatory inversion process. Dynamic Kalman smoothers are proposed to adapt t...
OPTIMIZATION OF ENCODING AND ERROR PROTECTION PARAMETERS FOR 3D VIDEO BROADCAST OVER DVB-H Aksay, Anil; Bugdayci, Done; Akar, Gözde (2011-05-18) In this study, we propose a heuristic methodology for modeling the end-to-end distortion characteristics of an error resilient broadcast system for 3D video overDigital Video Broadcasting -Handheld (DVB-H). We also use this model to optimally select the parameters of the video encoder and the error correction scheme, namely, Multi Protocol Encapsulation Forward Error Correction (MPE-FEC), minimizing the overall distortion. The proposed method models the RQ curve of video encoder and performance of channel c...
Investigation of effect of design and operating parameters on acoustophoretic particle separation via 3D device-level simulations Sahin, Mehmet Akif; ÇETİN, BARBAROS; Özer, Mehmet Bülent (Springer Science and Business Media LLC, 2019-12-16) In the present study, a 3D device-level numerical model is implemented via finite element method to assess the effects of design and operating parameters on the separation performance of a microscale acoustofluidic device. Elastodynamic equations together with electromechanical coupling at the piezoelectric actuators for the stress field within the solid parts, Helmholtz equation for the acoustic field within fluid, and Navier-Stokes equations for the fluid flow are coupled for the simulations. Once the zer...

Citation Formats

D. Sayın, “Improved Image Generation in Normalizing Flows through a Multi-Scale Architecture and Variational Training,” M.S. - Master of Science, Middle East Technical University, 2022.