Relaxed Spatio-Temporal Deep Feature Aggregation for Real-Fake Expression Prediction

Download
2017-10-29
Ozkan, Savas
Akar, Gözde
Frame-level visual features are generally aggregated in time with the techniques such as LSTM, Fisher Vectors, NetVLAD etc. to produce a robust video-level representation. We here introduce a learnable aggregation technique whose primary objective is to retain short-time temporal structure between frame-level features and their spatial interdependencies in the representation. Also, it can be easily adapted to the cases where there have very scarce training samples. We evaluate the method on a real-fake expression prediction dataset to demonstrate its superiority. Our method obtains 65% score on the test dataset in the official MAP evaluation and there is only one misclassified decision with the best reported result in the Chalearn Challenge (i.e. 66.7%). Lastly, we believe that this method can be extended to different problems such as action/event recognition in future.

Suggestions

Adaptive mean-shift for automated multi object tracking
Beyan, C.; Temizel, Alptekin (2012-01-01)
Mean-shift tracking plays an important role in computer vision applications because of its robustness, ease of implementation and computational efficiency. In this study, a fully automatic multiple-object tracker based on mean-shift algorithm is presented. Foreground is extracted using a mixture of Gaussian followed by shadow and noise removal to initialise the object trackers and also used as a kernel mask to make the system more efficient by decreasing the search area and the number of iterations to conve...
Enhancing the accuracy of the interpolations and anterpolations in MLFMA
Ergül, Özgür Salih (Institute of Electrical and Electronics Engineers (IEEE), 2006-01-01)
We present an efficient technique to reduce the interpolation and anterpolation (transpose interpolation) errors in the aggregation and disaggregation processes of the multilevel fast multipole algorithm (MLFMA), which is based on the sampling of the radiated and incoming fields over all possible solid angles, i.e., all directions on the sphere. The fields sampled on the sphere are subject to various operations, such as interpolation, aggregation, translation, disaggregation, anterpolation, and integration....
3-D structure assisted reference view generation for H.264 based multi-view video coding
Gedik, O. Serdar; Oezkalayci, Burak; Alatan, Abdullah Aydın (2007-06-13)
A 3D geometry-based multi-view video coding (MVC) method is proposed. In order to utilize the spatial redundancies between multiple views, the scene geometry is estimated as dense depth maps. The dense depth estimation problem is modeled by using a Markov random field (MRF) and solved via the belief propagation algorithm. Relying on these depth maps of the scene, novel view estimates of the intermediate views of the multi-view set is obtained with a 3D warping algorithm, which also performs hole-filling in ...
3D object recognition from range images using transform invariant object representation
AKAGÜNDÜZ, erdem; Ulusoy, İlkay (Institution of Engineering and Technology (IET), 2010-10-28)
3D object recognition is performed using a scale and orientation invariant feature extraction method and a scale and orientation invariant topological representation. 3D surfaces are represented by sparse, repeatable, informative and semantically meaningful 3D surface structures, which are called multiscale features. These features are extracted with their scale (metric size and resolution) using the classified scale-space of 3D surface curvatures. Triplets of these features are used to represent the surfac...
Fusion of Image Segmentations under Markov Random Fields
Karadag, Ozge Oztimur; Yarman Vural, Fatoş Tunay (2014-08-28)
In this study, a fast and efficient consensus segmentation method is proposed which fuses a set of baseline segmentation maps under an unsupervised Markov Random Fields (MRF) framework. The degree of consensus among the segmentation maps are estimated as the relative frequency of co-occurrences among the adjacent segments. Then, these relative frequencies are used to construct the energy function of an unsupervised MRF model. It is well-known that MRF framework is commonly used for formulating the spatial r...
Citation Formats
S. Ozkan and G. Akar, “Relaxed Spatio-Temporal Deep Feature Aggregation for Real-Fake Expression Prediction,” 2017, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/47843.