Superpixel based efficient image representation for segmentation and classification

Taşlı, Hüseyin Emrah
The wide availability of visual capture and display devices with increasing resolution and a ordable prices, made the visual data an indispensable part of our life. The enormous amount of visual data produced every day is captured, stored and sometimes processed for further analysis. In this era of technological improvement, where an exponential increase in the number and capability of the devices is experienced, researchers have focused on e cient and accurate ways to reach, store, analyse and display the data for various purposes. At the capture side of the visual content, the number of cameras has rapidly increased in close correlation to the number of mobile phones with built in cameras. As with the quantity increase, the quality of the sensors have also boosted regarding the resolution, color/brightness and noise level performance. On the other side of the pipeline, there has been some major changes at the display side over the last couple of decades. With the introduction of the Plasma and LCD (Liquid-crystal-display) type of displays, sizes have rapidly decreased in the depth dimension. This decrease also made the mobility of the displays possible especially with lower power consumptions. Therefore, mobile equipments with high resolution displays could easily t in our pockets. Moreover, another major stepping stone towards a richer visual experience is observed with the introduction of 3D capable displays for di erent sizes and resolutions. There has been a major increase in the popularity of 3D TVs in the last couple of years. Mobile devices with 3D capability have also been introduced in the market. However, the fast increase in the display side could not be matched as well in the capture and broadcast side. Therefore, the popularity of the 3D devices have been lower than the expectations. Various factors could be counted as a cause for such a slower reaction. These factors and possible solutions for such problems are presented in this thesis. This thesis deals with various aspects of the research in visual content analysis and display technologies. The author's previous experience in real time processing of image/video data, human visual perspectives for objective/subjective quality analysis, stereoscopy and 3D perception, image understanding for object recognition, image feature descriptors using low-, mid- and region- level visual cues have been vastly incorporated in this thesis. Applications of the proposed techniques for real world scenarios have been conducted and results are supported with performance evaluations using objective and subjective quality metrics. Superpixel extraction is proposed as an e cient image representation tool. It has been shown to o er computational e ciency with high segmentation performance. Extraction of the superpixel has been realized using a color and spatial distance metric where the weighting is de ned as a trade-o parameter. With extensive comparative tests with the state-of-the-art, the proposed scheme is shown to yield a remarkable alternative in the current superpixel and supervoxel extraction methods with faster execution times and competitive segmentation performances. The extracted superpixels have been further utilized for user-assisted image segmentation purposes. User assistance is required as drawing lines on the representative parts of the image to de ne foreground and background regions. An energy minimization technique is then used to de ne most likely regions to be segmented. The acquired foreground segments could further be used for rendering the stereo pair of an image for 3D visualization purposes. The same energy formulization is also extended on the stereo and video footage for completeness. The segmented superpixel patches are also presented as mid-level information sources and applied on the image classi cation task. Pixel-wise image descriptors are studied and extended using the proposed mid-level region descriptor in order to capture the complementary mid-level information present in the image. The experimental results have shown supporting evidence for the proposal where classi cation scores has considerably increased.


Coding algorithms for 3DTV - A survey
Smolic, Aljoscha; Mueller, Karsten; Stefanoski, Nikolce; Ostermann, Joern; Gotchev, Atanas; Akar, Gözde; Triantafyllidis, Georgios; Koz, Alper (2007-11-01)
Research efforts on 3DTV technology have been strengthened worldwide recently, covering the whole media processing chain from capture to display. Different 3DTV systems rely on different 3-D scene representations that integrate various types of data. Efficient coding of these data is crucial-for the success of 3DTV. Compression of pixel-type data including stereo video, multiview video, and associated depth or disparity maps extends available principles of classical video coding. Powerful algorithms and ope...
3D Face Reconstruction Using Stereo Images and Structured Light
OZTURK, Ahmet Oguz; Halıcı, Uğur; ULUSOY PARNAS, İLKAY; AKAGUNDUZ, Erdem (2008-04-22)
In this paper, the 3D face scanner that we developed using stereo cameras and structured light together is presented. Structured light having a pattern of vertical lines is used to create feature points and to match them easily. 3D point cloud obtained by stereo analysis is post processed to obtain the 3D model in obj format.
3D Object Modeling by Structured Light and Stereo Vision
Ozenc, Ugur; Tastan, Oguzhan; GÜLLÜ, MEHMET KEMAL (2015-05-19)
In this paper, we demonstrate a 3D object modeling system utilizing a setup which consists of two CMOS cameras and a DLP projector by making use of structured light and stereo vision. The calibration of the system is carried out using calibration pattern. The images are taken with stereo camera pair by projecting structured light onto the object and the correspondence problem is solved by both epipolar constraint of stereo vision and gray code constraint of structured light. The first experimental results s...
Privacy protection of tone-mapped HDR images using false colours
ÇİFTÇİ, Serdar; Akyüz, Ahmet Oğuz; PİNHEİRO, Antonio M. G.; Ebrahimi, Touradj (2017-12-01)
High dynamic range (HDR) imaging has been developed for improved visual representation by capturing a wide range of luminance values. Owing to its properties, HDR content might lead to a larger privacy intrusion, requiring new methods for privacy protection. Previously, false colours were proved to be effective for assuring privacy protection for low dynamic range (LDR) images. In this work, the reliability of false colours when used for privacy protection of HDR images represented by tone-mapping operators...
An Efficient graph-theoretical approach for interactive mobile image and video segmentation
Şener, Ozan; Alatan, Abdullah Aydın; Department of Electrical and Electronics Engineering (2013)
Over the past few years, processing of visual information by mobile devices getting more affordable due to the advances in mobile technologies. Efficient and accurate segmentation of objects from an image or video leads many interesting multimedia applications. In this study, we address interactive image and video segmentation on mobile devices. We first propose a novel interaction methodology leading better satisfaction based on subjective user evaluation. Due to small screens of the mobile devices, error ...
Citation Formats
H. E. Taşlı, “Superpixel based efficient image representation for segmentation and classification,” Ph.D. - Doctoral Program, Middle East Technical University, 2013.