Robust content-based copy detection and information theoretic indexing strategies

Download

index.pdf

Date

2015

Author

Saracoğlu, Ahmet

Metadata

Show full item record

Item Usage Stats

217
views

131
downloads

Today, 100 hours of video is uploaded every minute to YouTube. By the end of 2015, 500 billion hours of video will be viewable from wide range of sources such as on demand video, Internet-based television and social networks. As a result important and unavoidable problems arise; management of the copyrights, numerous duplicates and content discovery. Obviously these problems may generate tremendous loss for content owners and broadcasting/hosting companies while diminishing user satisfaction. Accordingly, efficient duplicate video detection can be utilized for the solution of the aforementioned problems. Content Based Copy Detection (CBCD) emerges as a viable choice against active duplicate detection methodology of watermarking. In this thesis, building blocks of a content-based copy detection system are investigated. A novel spatio-temporal global representation is initially proposed that exploits visual features independent of the spatial information. This system is improved by a local interest point-based detection pipeline and it is shown to outperform global representation approaches through extensive simulations. On the other hand, it is observed that accuracy of local feature approaches is often limited by the presence of uninformative and redundant features extracted from the frame. Moreover, at large scale index size and corresponding amount of memory becomes a significant bottleneck. In order to decrease the index size while increasing the discriminativeness of the reference feature database, a novel information theoretic indexing method is proposed and improved further by the introduced entropy estimator. This estimator is shown to yield more robust results compared to naïve frequentist techniques. Furthermore, in comprehensive experiments using the proposed method, it has been shown that only with a fraction of the reference features same detection performance and even for some transformations 0.00 Normalized Detection Cost Rate (NDCR) is achieved, which was not possible previously with full indexing. Extending this foundation, another method to exploit distributions of local features in a temporal volume is also provided. With this temporal approach, for most of the transformations 31% to 83% improvement on NDCR is observed. Finally, in order to capture the dependence of multiple features in a given frame fundamentals of interaction information is discussed and a visual phrase representation for content-based copy detection is introduced. Experimental evaluations show that the proposed visual phrase representation and multivariate feature selection approaches are competing with the state-of-the-art.

Subject Keywords

Content-based image retrieval., Digital media, Digital video, Data structures (Computer science)., Database management.

URI

http://etd.lib.metu.edu.tr/upload/12618611/index.pdf
https://hdl.handle.net/11511/24530

Collections

Graduate School of Natural and Applied Sciences, Thesis

Suggestions

OpenMETU
Core

Content-based video copy detection / Özkan, Savaş; Akar, Gözde; Department of Electrical and Electronics Engineering (2014) In recent years, need in automatic video copy detection has been increased rapidly with the recent technical developments. In general, a developed system should provide a few requirements to conduct over large database including high detection accuracy, low comparison time and low memory usage. For that purpose, within the scope of the thesis, we propose a content-based video copy detection system that consists of three crucial stages namely feature extraction, quantization-based indexing and geometric veri...
Robust quality metrics for assessing multimodal data Konuk, Barış; Akar, Gözde; Department of Electrical and Electronics Engineering (2015) In this thesis work; a novel, robust, objective, no-reference video quality assessment (VQA) metric, namely Spatio-Temporal Network aware Video Quality Metric (STNVQM), has been proposed for estimating perceived video quality under compression and transmission distortions. STN-VQM uses parameters reflecting the spatiotemporal characteristics of the video such as spatial complexity and motion. STN-VQM also utilizes parameters representing distortions due to compression and transmission such as bit rate and p...
Intra prediction with 3-tap filters for lossless and lossy video coding Ranjbar Alvar, Saeed; Kamışlı, Fatih; Department of Electrical and Electronics Engineering (2016) Video coders are primarily designed for lossy compression. The basic steps in modern lossy video compression are block-based spatial or temporal prediction, transformation of the prediction error block, quantization of the transform coefficients and entropy coding of the quantized coefficients together with other side information. In some cases, this lossy coding architecture may not be efficient for compression. For example, when lossless video compression is desirable, the transform and quantization steps...
UTILIZATION OF EVENT BASED CAMERAS FOR VIDEO FRAME INTERPOLATION Kılıç, Onur Selim; Alatan, Abdullah Aydın; Department of Electrical and Electronics Engineering (2022-8-25) Video Frame Interpolation (VFI) aims to synthesize several frames in the middle of two adjacent original video frames. State-of-the-art frame interpolation techniques create intermediate frames by considering the objects' motions within the frames. However, these approaches adopt a first-order approximation that fails without information between the keyframes. Event cameras are new sensors that provide additional information in the dead time between frames. They measure per-pixel brightness changes asynchro...
Automatic semantic content extraction in videos using a spatio-temporal ontology model Yıldırım, Yakup; Yazıcı, Adnan; Department of Computer Engineering (2009) Recent increase in the use of video in many applications has revealed the need for extracting the content in videos. Raw data and low-level features alone are not sufficient to fulfill the user's need; that is, a deeper understanding of the content at the semantic level is required. Currently, manual techniques are being used to bridge the gap between low-level representative features and high-level semantic content, which are inefficient, subjective and costly in time and have limitations on querying capab...

Citation Formats

A. Saracoğlu, “Robust content-based copy detection and information theoretic indexing strategies,” Ph.D. - Doctoral Program, Middle East Technical University, 2015.