Prior knowledge guided weakly supervised object detection and semantic segmentation

2022-2
Baltacı, Fatih
State-of-the-art recognition models in computer vision are trained using annotated training data. Collecting manual annotation for images is a time-consuming and tedious task. Annotation time and difficulty also change across computer vision tasks. For example, object detection tasks require bounding-box annotations, which can be difficult to annotate, particularly in complex scenes, and semantic segmentation tasks require pixel-level annotations, which by definition requires a great amount of effort. Weakly-supervised learning methods, typically studied for object detection and semantic segmentation, aim to avoid such detailed annotations and instead rely on image-level labels indicating the presence or absence of object categories. Existing results, however, indicate that weakly-supervised learning methods tend to result in recognition models that significantly underperform their fully-supervised counterparts. To this end, towards reducing the performance gap between the weakly supervised and fully supervised approaches, this thesis explores the utilization of prior semantic knowledge about object categories in improving the weakly supervised training processes. We inject prior knowledge for object categories represented in terms of attributes or language-based class embeddings into existing weakly-supervised object detection and semantic segmentation training approaches. Our experimental results show that the proposed method can clearly improve the recognition performance in several cases on benchmark datasets.

Suggestions

Learning semi-supervised nonlinear embeddings for domain-adaptive pattern recognition
Vural, Elif (null; 2019-05-20)
We study the problem of learning nonlinear data embeddings in order to obtain representations for efficient and domain-invariant recognition of visual patterns. Given observations of a training set of patterns from different classes in two different domains, we propose a method to learn a nonlinear mapping of the data samples from different domains into a common domain. The nonlinear mapping is learnt such that the class means of different domains are mapped to nearby points in the common domain in order to...
Data-driven image captioning via salient region discovery
Kilickaya, Mert; Akkuş, Burak Kerim; Çakıcı, Ruket; Erdem, Aykut; Erdem, Erkut; İKİZLER CİNBİŞ, NAZLI (Institution of Engineering and Technology (IET), 2017-09-01)
n the past few years, automatically generating descriptions for images has attracted a lot of attention in computer vision and natural language processing research. Among the existing approaches, data-driven methods have been proven to be highly effective. These methods compare the given image against a large set of training images to determine a set of relevant images, then generate a description using the associated captions. In this study, the authors propose to integrate an object-based semantic image r...
Neural information retrieval: at the end of the early years
Onal, Kezban Dilek; Zhang, Ye; Altıngövde, İsmail Sengör; Rahman, Md Mustafizur; Karagöz, Pınar; Braylan, Alex; Dang, Brandon; Chang, Heng-Lu; Kim, Henna; McNamara, Quinten; Angert, Aaron; Banners, Edward; Khetan, Vivek; McDonnell, Tyler; An Thanh Nguyen, An Thanh Nguyen; Xu, Dan; Wallace, Byron C.; de Rijke, Maarten; Lease, Matthew (Springer Science and Business Media LLC, 2018-06-01)
A recent "third wave'' of neural network (NN) approaches now delivers state-of-the-art performance in many machine learning tasks, spanning speech recognition, computer vision, and natural language processing. Because these modern NNs often comprise multiple interconnected layers, work in this area is often referred to as deep learning. Recent years have witnessed an explosive growth of research into NN-based approaches to information retrieval (IR). A significant body of work has now been created. In this ...
3D TRACKING OF PEOPLE WITH RAO-BLACKWELLIZED PARTICLE FILTERS
Topcu, Osman; Orguner, Umut; Alatan, Abdullah Aydın; ERCAN, ALİ ÖZER (2014-04-25)
Visual tracking has an important place among computer vision applications. Visual tracking with particle filters is a well-known methodology. The performance of particle filters is dependent on efficient sampling of the state space, which in turn, is dependent on number of particles. In this paper, Rao-Blackwell technique is applied to particle filters to improve sampling efficiency. Both algorithms are applied to people tracking problem. Under the same circumstances, the resulting algorithm is demonstrated...
Competing labels: a heuristic approach to pseudo-labeling in deep semi-supervised learning
Bayrak, Hamdi Burak; Ertekin Bolelli, Şeyda; Yücel, Hamdullah; Department of Scientific Computing (2022-2-10)
Semi-supervised learning is one of the dominantly utilized approaches to reduce the reliance of deep learning models on large-scale labeled data. One mostly used method of this approach is pseudo-labeling. However, pseudo-labeling, especially its originally proposed form tends to remarkably suffer from noisy training when the assigned labels are false. In order to mitigate this problem, in our work, we investigate the gradient sent to the neural network and propose a heuristic method, called competing label...
Citation Formats
F. Baltacı, “Prior knowledge guided weakly supervised object detection and semantic segmentation,” M.S. - Master of Science, Middle East Technical University, 2022.