Utilization of texture, contrast and color homogeneity for detecting and recognizing text from video frames

It is possible to index and manage large video archives in a more efficient manner by detecting and recognizing text within video frames. There are some inherent properties of videotext, such as distinguishing texture, higher contrast against background, and uniform color, making it detectable. By employing these properties, it is possible to detect text regions and binarize the image for character recognition. In this paper, a complete framework for detection and recognition of videotext is presented. The results from Gabor-based texture analysis, contrast-based segmentation and color homogeneity are merged to obtain minimum number of candidate regions before binarization. The performance of the system is tested for its recognition rate for various combinations and it is observed that the results give recognition rates, reasonable for most practical purposes.


Alignment of uncalibrated images for multi-view classification
Arık, Sercan Ömer; Vural, Elif; Frossard, Pascal (2011-12-29)
Efficient solutions for the classification of multi-view images can be built on graph-based algorithms when little information is known about the scene or cameras. Such methods typically require a pairwise similarity measure between images, where a common choice is the Euclidean distance. However, the accuracy of the Euclidean distance as a similarity measure is restricted to cases where images are captured from nearby viewpoints. In settings with large transformations and viewpoint changes, alignment of im...
Image annotation with semi-supervised clustering
Sayar, Ahmet; Yarman Vural, Fatoş Tunay; Department of Computer Engineering (2009)
Image annotation is defined as generating a set of textual words for a given image, learning from the available training data consisting of visual image content and annotation words. Methods developed for image annotation usually make use of region clustering algorithms to quantize the visual information. Visual codebooks are generated from the region clusters of low level visual features. These codebooks are then, matched with the words of the text document related to the image, in various ways. In this th...
A complexity-utility framework for optimizing quality of experience for visual content in mobile devices
Önür, Özgür Deniz; Alatan, Abdullah Aydın; Department of Electrical and Electronics Engineering (2012)
Subjective video quality and video decoding complexity are jointly optimized in order to determine the video encoding parameters that will result in the best Quality of Experience (QoE) for an end user watching a video clip on a mobile device. Subjective video quality is estimated by an objective criteria, video quality metric (VQM), and a method for predicting the video quality of a test sequence from the available training sequences with similar content characteristics is presented. Standardized spatial i...
Exploiting information extraction techniques for automatic semantic video indexing with an application to Turkish news videos
Kucuk, Dilek; Yazıcı, Adnan (Elsevier BV, 2011-08-01)
This paper targets at the problem of automatic semantic indexing of news videos by presenting a video annotation and retrieval system which is able to perform automatic semantic annotation of news video archives and provide access to the archives via these annotations. The presented system relies on the video texts as the information source and exploits several information extraction techniques on these texts to arrive at representative semantic information regarding the underlying videos. These techniques ...
Depth assisted object segmentation in multi-view video
Cigla, Cevahir; Alatan, Abdullah Aydın (2008-01-01)
In this work, a novel and unified approach for multi-view video (MVV) object segmentation is presented. In the first stage, a region-based graph-theoretic color segmentation algorithm is proposed, in which the popular Normalized Cuts segmentation method is improved with some modifications on its graph structure. Segmentation is obtained by recursive bi-partitioning of a weighted graph of an initial over-segmentation mask. The available segmentation mask is also utilized during dense depth map estimation ste...
Citation Formats
S. Tekinalp and A. A. Alatan, “Utilization of texture, contrast and color homogeneity for detecting and recognizing text from video frames,” 2003, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/55182.