Multiple kernel learning for first-person activity recognition

Download
2017
Özkan, Fatih
First-person vision applications have recently gained increasing popularity because of advances in wearable camera technologies. In the literature, existing descriptors have been adapted to the first-person videos or new descriptors have been proposed. These descriptors have been used in a single-kernel method which ignores the relative importance of each descriptor. On the other hand, first-person videos have different characteristics as compared to third-person videos which are captured by static cameras. Throughout the first-person video, vast changes occur in some attributes such as illumination or brightness. A significant amount of ego-motion is created because of the movements of the first-person camera wearer. Multiple features are used in order to capture the different changes in video characteristics. Therefore, appropriate feature and kernel selection are needed. In this thesis, local and global motion-related features are used. A data-driven approach is proposed in order to select and combine these features and kernels employed. Feature and kernel selection is performed through AdaBoost algorithm’s well-known trials in a probabilistic manner. At training stage, a classifier which shows better performance than other classifiers is determined for each trial. After all trials, classifiers which compose the final classifier are determined. At testing stage, final classifier makes decision for activity labels based on a voting mechanism. Experiments show that the proposed methods outperform the traditional SVM single kernel-based methods in literature in terms of recognition accuracy. 

Suggestions

Spatial 3D local descriptors for object recognition in RGB-D images
Loğoğlu, K. Berker; Temizel, Alptekin; Kalkan, Sinan; Department of Information Systems (2016)
Introduction of the affordable but relatively high resolution color and depth synchronized RGB-D sensors, along with the efforts on open-source point-cloud processing tools boosted research in both computer vision and robotics. One of the key areas which have drawn particular attention is object recognition since it is one of the crucial steps for various applications. In this thesis, two spatially enhanced local 3D descriptors are proposed for object recognition tasks: Histograms of Spatial Concentric Surf...
Boosted multiple kernel learning for first-person activity recognition
Özkan, Fatih; Arabacı, Mehmet Ali; Sürer, Elif; Temizel, Alptekin (2017-09-02)
Activity recognition from first-person (ego-centric) videos has recently gained attention due to the increasing ubiquity of the wearable cameras. There has been a surge of efforts adapting existing feature descriptors and designing new descriptors for the first-person videos. An effective activity recognition system requires selection and use of complementary features and appropriate kernels for each feature. In this study, we propose a data-driven framework for first-person activity recognition which effec...
Multi-modal Egocentric Activity Recognition Through Decision Fusion
Arabacı, Mehmet Ali; Temizel, Alptekin; Sürer, Elif; Department of Information Systems (2023-1-18)
The usage of wearable devices has rapidly grown in daily life with the development of sensor technologies. The most prominent information for wearable devices is collected from optics which produces videos from an egocentric perspective, called First Person Vision (FPV). FPV has different characteristics from third-person videos because of the large amounts of ego-motions and rapid changes in scenes. Vision-based methods designed for third-person videos where the camera is away from events and actors, canno...
Hierarchical representations for visual object tracking by detection
Beşbınar, Beril; Alatan, Abdullah Aydın; Department of Electrical and Electronics Engineering (2015)
Deep learning is the discipline of training computational models that are composed of multiple layers and these methods have improved the state of the art in many areas such as visual object detection, scene understanding or speech recognition. Rebirth of these fairly old computational models is usually related to the availability of large datasets, increase in the computational power of current hardware and more recently proposed unsupervised training methods that exploit the internal structure of very lar...
Good features to correlate for visual tracking
Gündoğdu, Erhan; Alatan, Abdullah Aydın; Department of Electrical and Electronics Engineering (2017)
Estimating object motion is one of the key components of video processing and the first step in applications which require video representation. Visual object tracking is one way of extracting this component, and it is one of the major problems in the field of computer vision. Numerous discriminative and generative machine learning approaches have been employed to solve this problem. Recently, correlation filter based (CFB) approaches have been popular due to their computational efficiency and notable perfo...
Citation Formats
F. Özkan, “Multiple kernel learning for first-person activity recognition,” M.S. - Master of Science, Middle East Technical University, 2017.