Spatio-Temporal Analysis of Facial Actions using Lifecycle-Aware Capsule Networks

Date

2021-12-26

Author

Churamani, Nikhil
Kalkan, Sinan
Güneş, Hatice

Metadata

Show full item record

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Item Usage Stats

68
views

0
downloads

Most state-of-the-art approaches for Facial Action Unit (AU) detection rely on evaluating static frames, encoding a snapshot of heightened facial activity. In real-world interactions, however, facial expressions are more subtle and evolve over time requiring AU detection models to learn spatial as well as temporal information. In this work, we focus on both spatial and spatio-temporal features encoding the temporal evolution of facial AU activation. We propose the Action Unit Lifecycle-Aware Capsule Network (AULA-Caps) for AU detection using both frame and sequence-level features. While, at the frame-level, the capsule layers of AULA-Caps learn spatial feature primitives to determine AU activations, at the sequence-level, it learns temporal dependencies between contiguous frames by focusing on relevant spatio-temporal segments in the sequence. The learnt feature capsules are routed together such that the model learns to selectively focus on spatial or spatio-temporal information depending upon the AU lifecycle. The proposed model is evaluated on popular benchmarks, namely BP4D and GFT datasets, obtaining state-of-the-art results for both.

URI

http://iab-rubric.org/fg2021/pdfs/FG2021_program.pdf
https://hdl.handle.net/11511/102328

DOI

https://doi.org/10.1109/fg52635.2021.9666978

Conference Name

IEEE International Conference on Automatic Face and Gesture Recognition

Collections

Department of Computer Engineering, Conference / Seminar

Suggestions

OpenMETU
Core

Occlusion-aware 3-D multiple object tracking for visual surveillance Topçu, Osman; Alatan, Abdullah Aydın; Ercan, Ali Özer; Department of Electrical and Electronics Engineering (2013) This thesis work presents an occlusion-aware particle filter framework for online tracking of multiple people with observations from multiple cameras with overlapping fields of view for surveillance applications. Surveillance problem involves inferring motives of people from their actions, deduced from their trajectories. Visual tracking is required to obtain these trajectories and it is a challenging problem due to motion model variations, size and illumination changes and especially occlusions between mov...
Statistical Analysis and Directional Coding of Layer-based HDR Image Coding Residue Feyiz, Kutan; Kamışlı, Fatih; Zerman, Emin; Valenzise, Giuseppe; Koz, Alper; Dufaux, Frederic (2017-10-18) Existing methods for layer-based backward compatible high dynamic range (HDR) image and video coding mostly focus on the rate-distortion optimization of base layer while neglecting the encoding of the residue signal in the enhancement layer. Although some recent studies handle residue coding by designing function based fixed global mapping curves for 8-bit conversion and exploiting standard codecs on the resulting 8-bit images, they do not take the local characteristics of residue blocks into account. Inspi...
Engaging with the Scenario: Affect and Facial Patterns from a Scenario-Based Intelligent Tutoring System Nye, Benjamin; Karumbaiah, Shamya; Tokel, Saniye Tuğba; Mark, Core; Giota, Stratou; Auerbach, Daniel; Kallirroi, Georgila (2018-06-20) Facial expression trackers output measures for facial action units (AUs), and are increasingly being used in learning technologies. In this paper, we compile patterns of AUs seen in related work as well as use factor analysis to search for categories implicit in our corpus. Although there was some overlap between the factors in our data and previous work, we also identified factors seen in the broader literature but not previously reported in the context of learning environments. In a correlational analysis...
Multi Camera Visual Surveillance for Motion Detection Occlusion Handling Tracking and Event Recognition Akman, Oytun; Alatan, Abdullah Aydın; Çiloğlu, Tolga (null; 2008-10-05) This paper presents novel approaches for background modeling, occlusion handling and event recognition by using multi-camera configurations that can be used to overcome the limitations of the single camera configurations. The main novelty in proposed background modeling approach is building multivariate Gaussians background model for each pixel of the reference camera by utilizing homography-related positions. Also, occlusion handling is achieved by generation of the top-view via trifocal tensors, as a resu...
Fine-Grained Object Recognition and Zero-Shot Learning in Remote Sensing Imagery Sumbul, Gencer; Cinbiş, Ramazan Gökberk; Aksoy, Selim (2018-02-01) Fine-grained object recognition that aims to identify the type of an object among a large number of subcategories is an emerging application with the increasing resolution that exposes new details in image data. Traditional fully supervised algorithms fail to handle this problem where there is low betweenclass variance and high within-class variance for the classes of interest with small sample sizes. We study an even more extreme scenario named zero-shot learning (ZSL) in which no training example exists f...

Citation Formats

N. Churamani, S. Kalkan, and H. Güneş, “Spatio-Temporal Analysis of Facial Actions using Lifecycle-Aware Capsule Networks,” presented at the IEEE International Conference on Automatic Face and Gesture Recognition, Hindistan, 2021, Accessed: 00, 2023. [Online]. Available: http://iab-rubric.org/fg2021/pdfs/FG2021_program.pdf.