Effect of quantization on the performance of deep networks

Kütükcü, Başar
Deep neural networks performed greatly for many engineering problems in recent years. However, power and memory hungry nature of deep learning algorithm prevents mobile devices to benefit from the success of deep neural networks. The increasing number of mobile devices creates a push to make deep network deployment possible for resource-constrained devices. Quantization is a solution for this problem. In this thesis, different quantization techniques and their effects on deep networks are examined. The techniques are benchmarked by their success and memory requirements. The effects of quantization are examined for different network architectures including shallow, overparameterized, deep, residual, efficient models. Architecture specific problems are observed and related solutions are proposed. Quantized models are compared with ground-up efficiently designed models. The advantages and disadvantages of each technique are examined. Standard and quantized convolution operations implemented in real systems ranging from low power embedded systems to powerful desktop computer systems. Computation time and memory requirements are examined in these real systems.
Citation Formats
B. Kütükcü, “Effect of quantization on the performance of deep networks,” Thesis (M.S.) -- Graduate School of Natural and Applied Sciences. Electrical and Electronics Engineering., 2020.