Robust quality metrics for assessing multimodal data

Konuk, Barış
In this thesis work; a novel, robust, objective, no-reference video quality assessment (VQA) metric, namely Spatio-Temporal Network aware Video Quality Metric (STNVQM), has been proposed for estimating perceived video quality under compression and transmission distortions. STN-VQM uses parameters reflecting the spatiotemporal characteristics of the video such as spatial complexity and motion. STN-VQM also utilizes parameters representing distortions due to compression and transmission such as bit rate and packet loss ratio. STN-VQM has been trained on the Laboratory of Image and Video Engineering (LIVE) VQA database, owned by University of Texas at Austin, and evaluated on LIVE, Ecole Politechnique Federale de Lausanne (EPFL)- Politecnico di Milano (PoliMI) and Instituto de Telecomunicacoes, Instituto Superior Tecnico (IT-IST) VQA databases and also on video streams in University of Plymouth audiovisual quality assessment (AVQA) database. STN-VQM is proven to predict perceived video quality accurately on these databases, which span a wide range of video contents, video codecs, spatial resolutions, bit rates, frame rates, packet losses etc. Comparison to the existing state-of-the-art VQA metrics indicates that the STN-VQM provides promising results. Moreover, a novel, objective, no-reference audio quality assessment (AQA) metric has been introduced in order to predict perceived audio quality under compression and transmission distortions. Proposed AQA metric appraises perceived audio quality based on parameters such as sampling frequency, bit rate and packet loss ratio. Proposed AQA metric has been trained and evaluated on two different AQA databases. The AQA metric is shown to appraise perceived audio quality reliably on these AQA databases, which have different audio encoding types. Finally, an objective, no-reference AVQA metric (namely, Direct AudioVisual Quality Assessment – DAVQA) has been obtained by applying the classical approach in the literature, i.e., by combining perceived video quality estimate, perceived audio quality estimate and their product. Moreover, a novel video classification method which classifies videos according to their spatio-temporal characteristics has been developed. Using this spatio-temporal based video classification method, a novel, content-dependent AVQA algorithm (namely Content Dependent AudioVisual Quality Assessment – CDAVQA) has been designed. The CDAVQA model is shown to be more accurate than the DAVQA model on the audiovisual data in the University of Plymouth AVQA database.