Hide/Show Apps

End-to-end learned image compression with conditional latent space modelling for entropy coding

Yeşilyurt, Aziz Berkay
This thesis presents a lossy image compression system based on an end-to-end trainable neural network. Traditional compression algorithms use linear transformation, quantization and entropy coding steps that are designed based on simple models of the data and are aimed to be low complexity. In neural network based image compression methods, the processing steps, such as transformation and entropy coding, are performed using neural networks. The use of neural networks enables transforms or probability models for entropy coding that can optimally process or represent data with much more complex dependencies instead of simple models, all at the expense of higher computational complexity than traditional methods. One major line of work on neural network based lossy image compression uses an autoencoder-type neural network for the transform and inverse transform of the compression system. The quantization of the latent variables, i.e. transform coefficients, and the arithmetic coding of the quantized latent variables are done with traditional methods. However, the probability distribution of the latent variables, which the arithmetic encoder works with, is represented also with a neural network. Parameters of all neural networks in the system are learned jointly from a training set of real images by minimizing the rate-distortion cost. One major work assumes the latent variables in a single channel (i.e. feature map or signal band) are independent and learns a single distribution model for each channel. The same authors then extend their work by incorporating a hyperprior neural network to capture the dependencies in the latent representation and improve the compression performance significantly. This thesis uses an alternative method to exploit the dependencies of the latent representation. The joint density of the latent representation is modeled as a product of conditional densities, which are learned using neural networks. However, each latent variable is not conditioned on all previous latent variables as in the Chain rule of factoring joint distributions, but only on a few previous variables, in particular the left, upper and upper-left spatial neighbors of that latent variable based on Markov property assumption. The compression performance is on par with the hyperprior based work, but the conditional densities require a much simpler network than the hyperprior network in the literature. While the conditional densities require much less training time due to their simplicity and less number of parameters than the hyperprior based neural network, their inference time is longer.