First-Person Activity Recognition with Multimodal Features

In typical third-person perspective videos, the camera is situated away from the actors involved in the events and the camera is not directly affected by the observed activities. On the other hand, first-person (egocentric) videos are captured using a camera on a person and reflect the first person perspective. In these videos, the observer is involved in the events and the camera undergoes large amounts of ego-motion. While many features proposed for activity recognition from third-person view can be adopted to be used in the first-person view problem, recently, new features exploiting the specific characteristics of the first-person problems have also been proposed. In addition, as the camera is directly involved in the activities, other features extracted from different modalities such as audio can also be used together with the video based features. On the other hand, different features have varying recognition performances and some features might have redundant information. In this talk, I will first summarise the current state-of-the-art on first person activity recognition. Then I will discuss several different features and provide a comparative assessment of their performance for first person activity recognition.
Citation Formats
A. Temizel, “First-Person Activity Recognition with Multimodal Features,” presented at the International Workshop on Computational Intelligence for Multimedia Understanding (IWCIM)(02 September 2017 ), Hios, Greece, 2017, Accessed: 00, 2021. [Online]. Available: