Prior knowledge guided weakly supervised object detection and semantic segmentation

2022-2
Baltacı, Fatih
State-of-the-art recognition models in computer vision are trained using annotated training data. Collecting manual annotation for images is a time-consuming and tedious task. Annotation time and difficulty also change across computer vision tasks. For example, object detection tasks require bounding-box annotations, which can be difficult to annotate, particularly in complex scenes, and semantic segmentation tasks require pixel-level annotations, which by definition requires a great amount of effort. Weakly-supervised learning methods, typically studied for object detection and semantic segmentation, aim to avoid such detailed annotations and instead rely on image-level labels indicating the presence or absence of object categories. Existing results, however, indicate that weakly-supervised learning methods tend to result in recognition models that significantly underperform their fully-supervised counterparts. To this end, towards reducing the performance gap between the weakly supervised and fully supervised approaches, this thesis explores the utilization of prior semantic knowledge about object categories in improving the weakly supervised training processes. We inject prior knowledge for object categories represented in terms of attributes or language-based class embeddings into existing weakly-supervised object detection and semantic segmentation training approaches. Our experimental results show that the proposed method can clearly improve the recognition performance in several cases on benchmark datasets.

Suggestions

Neural information retrieval: at the end of the early years
Onal, Kezban Dilek; Zhang, Ye; Altıngövde, İsmail Sengör; Rahman, Md Mustafizur; Karagöz, Pınar; Braylan, Alex; Dang, Brandon; Chang, Heng-Lu; Kim, Henna; McNamara, Quinten; Angert, Aaron; Banners, Edward; Khetan, Vivek; McDonnell, Tyler; An Thanh Nguyen, An Thanh Nguyen; Xu, Dan; Wallace, Byron C.; de Rijke, Maarten; Lease, Matthew (Springer Science and Business Media LLC, 2018-06-01)
A recent "third wave'' of neural network (NN) approaches now delivers state-of-the-art performance in many machine learning tasks, spanning speech recognition, computer vision, and natural language processing. Because these modern NNs often comprise multiple interconnected layers, work in this area is often referred to as deep learning. Recent years have witnessed an explosive growth of research into NN-based approaches to information retrieval (IR). A significant body of work has now been created. In this ...
Data-driven image captioning via salient region discovery
Kilickaya, Mert; Akkuş, Burak Kerim; Çakıcı, Ruket; Erdem, Aykut; Erdem, Erkut; İKİZLER CİNBİŞ, NAZLI (Institution of Engineering and Technology (IET), 2017-09-01)
n the past few years, automatically generating descriptions for images has attracted a lot of attention in computer vision and natural language processing research. Among the existing approaches, data-driven methods have been proven to be highly effective. These methods compare the given image against a large set of training images to determine a set of relevant images, then generate a description using the associated captions. In this study, the authors propose to integrate an object-based semantic image r...
Competing labels: a heuristic approach to pseudo-labeling in deep semi-supervised learning
Bayrak, Hamdi Burak; Ertekin Bolelli, Şeyda; Yücel, Hamdullah; Department of Scientific Computing (2022-2-10)
Semi-supervised learning is one of the dominantly utilized approaches to reduce the reliance of deep learning models on large-scale labeled data. One mostly used method of this approach is pseudo-labeling. However, pseudo-labeling, especially its originally proposed form tends to remarkably suffer from noisy training when the assigned labels are false. In order to mitigate this problem, in our work, we investigate the gradient sent to the neural network and propose a heuristic method, called competing label...
Visual Object Tracking with Autoencoder Representations
Besbinar, Beril; Alatan, Abdullah Aydın (2016-05-19)
Deep learning is the discipline of training computational models that are composed of multiple layers and these methods have recently improved the state of the art in many areas as a virtue of large labeled datasets, increase in the computational power of current hardware and unsupervised training methods. Although such a dataset may not be available for lots of application areas, the representations obtained by the well-designed networks that have a large representation capacity and trained with enough dat...
3D TRACKING OF PEOPLE WITH RAO-BLACKWELLIZED PARTICLE FILTERS
Topcu, Osman; Orguner, Umut; Alatan, Abdullah Aydın; ERCAN, ALİ ÖZER (2014-04-25)
Visual tracking has an important place among computer vision applications. Visual tracking with particle filters is a well-known methodology. The performance of particle filters is dependent on efficient sampling of the state space, which in turn, is dependent on number of particles. In this paper, Rao-Blackwell technique is applied to particle filters to improve sampling efficiency. Both algorithms are applied to people tracking problem. Under the same circumstances, the resulting algorithm is demonstrated...
Citation Formats
F. Baltacı, “Prior knowledge guided weakly supervised object detection and semantic segmentation,” M.S. - Master of Science, Middle East Technical University, 2022.