Training object detectors by directly optimizing lrp metric

Çam, Barış Can
This thesis focuses on training deep object detection networks by directly optimizing the localisation-recall-precision (LRP) performance metric that can evaluate classification and localisation performance of an object detector in a unified manner (Oksuz et al., 2018). To achieve this goal, unlike the commonly used linear weighting approach, we aim to implicitly optimize the LRP metric first by using a bounded localisation loss from previous works and proposing a loss function that can bound the range of classification task loss. In addition to this range balancing approach, we aim to train an object detector with an LRP regressor trained with LRP values collected during the training stage. We show that the proposed regression architecture can estimate LRP values with low error rates. However, training an object detector by attaching the regressor architecture as a differentiable LRP error estimator did not yield satisfactory results. Finally, by adapting the perceptron learning algorithm based approach proposed by Chen et al. (2020), we show that we can embed the LRP metric as a loss function to train a deep object detector. In this thesis, this perceptron learning-based approach is examined, and its generalization to all IoU based localisation loss functions is proposed.


Kaya, Emre Can; Alatan, Abdullah Aydın (2018-10-10)
A novel extension to proposal-based detection is proposed in order to learn convolutional context features for determining boundaries of objects better. Objects and their context are aimed to be learned through parallel convolutional stages. The resulting object and context feature maps are combined in such a way that they preserve their spatial relationship. The proposed algorithm is trained and evaluated on PASCAL VOC 2007 detection benchmark dataset and yielded improvements in performance over baseline, ...
Weakly supervised instance attention for multisource fine-grained object recognition with an application to tree species classification
Aygunes, Bulut; Cinbiş, Ramazan Gökberk; Aksoy, Selim (2021-06-01)
Multisource image analysis that leverages complementary spectral, spatial, and structural information benefits fine-grained object recognition that aims to classify an object into one of many similar subcategories. However, for multisource tasks that involve relatively small objects, even the smallest registration errors can introduce high uncertainty in the classification process. We approach this problem from a weakly supervised learning perspective in which the input images correspond to larger neighborh...
Representation Learning for Contextual Object and Region Detection in Remote Sensing
Firat, Orhan; Can, Gulcan; Yarman Vural, Fatoş Tunay (2014-08-28)
The performance of object recognition and classification on remote sensing imagery is highly dependent on the quality of extracted features, amount of labelled data and the priors defined for contextual models. In this study, we examine the representation learning opportunities for remote sensing. First we attacked localization of contextual cues for complex object detection using disentangling factors learnt from a small amount of labelled data. The complex object, which consists of several sub-parts is fu...
A Computationally Efficient Appearance-Based Algorithm for Geospatial Object Detection
Arslan, Duygu; Alatan, Abdullah Aydın (2012-04-27)
A computationally efficient appearance-based algorithm for geospatial object detection is presented and evaluated specifically for aircraft detection from satellite imagery. An aircraft operator exploiting the edge information via gray level differences between the aircraft and its background is constructed with Haar-like polygon regions by using the shape information of the aircraft as an invariant. Fast evaluation of the aircraft operator is achieved by means of integral image. Rotated integral images are...
Estimation of Articulatory Trajectories Based on Gaussian Mixture Model (GMM) With Audio-Visual Information Fusion and Dynamic Kalman Smoothing
ÖZBEK, İbrahim Yücel; Hasegawa-Johnson, Mark; Demirekler, Mübeccel (Institute of Electrical and Electronics Engineers (IEEE), 2011-07-01)
This paper presents a detailed framework for Gaussian mixture model (GMM)-based articulatory inversion equipped with special postprocessing smoothers, and with the capability to perform audio-visual information fusion. The effects of different acoustic features on the GMM inversion performance are investigated and it is shown that the integration of various types of acoustic (and visual) features improves the performance of the articulatory inversion process. Dynamic Kalman smoothers are proposed to adapt t...
Citation Formats
B. C. Çam, “Training object detectors by directly optimizing lrp metric,” M.S. - Master of Science, Middle East Technical University, 2020.