Human presence detection in emergency situations using deep learning based audio-visual systems

Geneci, İzlen
The significance of emergency event detection in surveillance systems has drawn the attention of researchers in recent years. Existing methods mostly depend on visual data to identify any abnormal events since only visual sensors are frequently put in public settings. On the other hand, in an emergency, sound information may be exploited. When eyesight is occluded, audio waves can penetrate to some extent. Applications for visual analysis may be helpful when there is noise in the audio and the scene is congested. Thus, the shift from single-modality to multimodality learning has become crucial given the recent rapid growth of deep learning. Both the audio analysis and the visual analysis were performed separately. In audio-based analysis, audio was transformed into samples using sliding window technique to capture the brief window of a target audio class. Therefore, in a real-time operating system, emergency circumstances can be recognized when the target sound happens briefly. For human sound classes of "Speech", "Scream", "Cry", the minimum sliding window sizes were 0.25 s, 1 s and 0.30 s, respectively. In visual analysis, face detection was conducted along with facial alignment using five facial landmarks. The AP for face detection was 77% on WIDER Face dataset (IoU=0.5). Using the detected faces, facial expression recognition (FER) was performed as well as age and gender estimations by employing an attention-based method. For seven basic emotions, 64.14% accuracy was achieved on AffectNet dataset. The combination of these audio and visual-based systems eliminates the limitations of perceptual tasks in both modalities.


Automated Moving Object Classification in Wireless Multimedia Sensor Networks
Civelek, Muhsin; Yazıcı, Adnan (2017-02-15)
The use of wireless multimedia sensor networks (WMSNs) for surveillance applications has attracted the interest of many researchers. As with traditional sensor networks, it is easy to deploy and operate WMSNs. With inclusion of multimedia devices in wireless sensor networks, it is possible to provide data to users that is more meaningful than that provided by scalar sensor-based systems alone; however, producing, storing, processing, analyzing, and transmitting multimedia data in sensor networks requires co...
Pedestrian zone anomaly detection by non-parametric temporal modelling
Gündüz, Ayşe Elvan; Taşkaya Temizel, Tuğba; Temizel, Alptekin (2014-08-29)
With the increasing focus on safety and security in public areas, anomaly detection in video surveillance systems has become increasingly more important. In this paper, we describe a method that models the temporal behavior and detects behavioral anomalies in the scene using probabilistic graphical models. The Coupled Hidden Markov Model (CHMM) method that we use shows that sparse features obtained via feature detection and description algorithms are suitable for modeling the temporal behavior patterns and ...
Onal, Itir; Kardas, Karani; Rezaeitabar, Yousef; Bayram, Ulya; Bal, Murat; Ulusoy, İlkay; Cicekli, Nihan Kesim (2013-07-19)
This paper presents a framework for detecting complex events in surveillance videos. Moving objects in the foreground are detected in the object detection component of the system. Whether these foregrounds are human or not is decided in the object recognition component. Then each detected object is tracked and labeled in the object tracking component, in which true labeling of objects in the occlusion situation is also provided. The extracted information is fed to the event detection component. Rule based e...
Guldogan, M. B.; Gustafsson, F.; Orguner, Umut; Bjorklund, S.; Petersson, H.; Nezirovic, A. (2011-05-27)
Monitoring and tracking human activities around restricted areas is an important issue in security and surveillance applications. The movement of different parts of the human body generates unique micro-Doppler features which can be extracted effectively using joint time-frequency analysis. In this paper, we describe the simultaneous tracking of both location and micro-Doppler features of a human using particle filters (PF). The results obtained using the data from a 77 GHz radar prove the successful usage ...
Feature Extraction and Object Classification for Target Identification at Wireless Multimedia Sensor Networks
Civelek, Muhsin; Yilmazer, Cengiz; Yazıcı, Adnan; Korkut, Fazli Oncul (2014-04-25)
In this paper, it is investigated the processes for automatic identification of the targets without personnel intervention in wireless multimedia sensor networks. Methods to extract the features of the object from the multimedia data and to classify the target type based on the extracted features are proposed within the scope of this study. The success of the proposed methods are tested by implementing a Matlab application and the results are presented in this paper
Citation Formats
İ. Geneci, “Human presence detection in emergency situations using deep learning based audio-visual systems,” M.S. - Master of Science, Middle East Technical University, 2022.