Utilization of dense depth information for monoview object detection and instance segmentation

Çakırgöz, Çağlayan Can
Object detection aims for detecting objects of certain classes in an image by bounding them in rectangular boxes whereas instance segmentation tries to detect objects in pixel level. Deep learning techniques, which have shown great improvements over the last decade, are utilized in these topics as well, and a significant success is achieved against the traditional methods. Similar improvements can be observed in dense depth estimation which deals with deducing dense information of a scene from a single image. Previous works have shown that object detection and instance segmentation performances can be improved by incorporating sensor depth information. This thesis studies whether or not it is possible to have similar improvements when depth information is estimated from images instead of directly provided from sensors. Our research have shown that incorporating estimated depth data results in higher performance in object detection, although it fails in instance segmentation.


Rescoring detections based on contextual scores in object detection
Zorlu, Ersan Vural; Akbaş, Emre; Department of Computer Engineering (2019)
To detect objects in an image, current state-of-the-art object detectors firstly definecandidate object locations, and then classify each of them into one of the predefinedcategories or as background. They do so by using the visual features extracted locallyfrom the candidate locations; omitting the rich contextual information embedded inthe whole image. Contextual information can be utilized to complement the informa-tion extracted locally and thereby to improve object detection accuracy. Researchershave p...
Object Detection with Convolutional Context Features
Kaya, Emre Can; Alatan, Abdullah Aydın (2017-01-01)
A novel extension to Huh B-ESA object detection algorithm is proposed in order to learn convolutional context features for determining boundaries of objects better. For input images, the hypothesis windows and their context around those windows are learned through convolutional layers as two parallel networks. The resulting object and context feature maps are combined in such a way that they preserve their spatial relationship. The proposed algorithm is trained and evaluated on PASCAL VOC 2007 detection ben...
Scale invariant representation of 2 5D data
AKAGUNDUZ, Erdem; ULUSOY PARNAS, İLKAY; BOZKURT, Nesli; Halıcı, Uğur (2007-06-13)
In this paper, a scale and orientation invariant feature representation for 2.5D objects is introduced, which may be used to classify, detect and recognize objects even under the cases of cluttering and/or occlusion. With this representation a 2.5D object is defined by an attributed graph structure, in which the nodes are the pit and peak regions on the surface. The attributes of the graph are the scales, positions and the normals of these pits and peaks. In order to detect these regions a "peakness" (or pi...
Bayramoglu, Neslihan; Akman, Oytun; Alatan, Abdullah Aydın; Jonker, Pieter (2009-09-11)
In the field of vision based robot actuation, in order to manipulate objects in an environment, background separation and object selection a re fundamental tasks that should be carried out in a fast and efficient way. In this paper, we propose a method to segment possible object locations in the scene and recognize them via local-point based representation. Exploiting the resulting 3D structure of the scene via a time-of-flight camera, background regions are eliminated with the assumption that the objects a...
Akman, Oytun; Bayramoglu, Neslihan; Alatan, Abdullah Aydın; Jonker, Pieter (2010-06-09)
Object segmentation has an important role in the field of computer vision for semantic information inference. Many applications such as 3DTV archive systems, 3D/2D model fitting, object recognition and shape retrieval are strongly dependent to the performance of the segmentation process. In this paper we present a new algorithm for object localization and segmentation based on the spatial information obtained via a Time-of-Flight (TOF) camera. 3D points obtained via a TOF camera are projected onto the major...
