Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Hand-crafted versus learned representations for audio event detection
Date
2022-04-01
Author
Kucukbay, Selver Ezgi
Yazıcı, Adnan
Kalkan, Sinan
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
179
views
0
downloads
Cite This
Audio Event Detection (AED) pertains to identifying the types of events in audio signals. AED is essential for applications requiring decisions based on audio signals, which can be critical, for example, for health, surveillance and security applications. Despite the proven benefits of deep learning in obtaining the best representation for solving a problem, AED studies still generally employ hand-crafted representations even when deep learning is used for solving the AED task. Intrigued by this, we investigate whether or not hand-crafted representations (i.e. spectogram, mel spectogram, log mel spectogram and mel frequency cepstral coefficients) are better than a representation learned using a Convolutional Autoencoder (CAE). To the best of our knowledge, our study is the first to ask this question and thoroughly compare feature representations for AED. To this end, we first find the best hop size and window size for each hand-crafted representation and compare the optimized hand-crafted representations with CAE-learned representations. Our extensive analyses on a subset of the AudioSet dataset confirm the common practice in that hand-crafted representations do perform better than learned features by a large margin (similar to 30 AP). Moreover, we show that the commonly used window and hop sizes do not provide the optimal performances for the hand-crafted representations.
Subject Keywords
Audio event detection
,
Audio event classification
,
Deep learning
,
Log mel spectogram
,
Mel spectrogram
,
Spectrogram
,
MFCC
,
CLASSIFICATION
URI
https://hdl.handle.net/11511/97158
Journal
MULTIMEDIA TOOLS AND APPLICATIONS
DOI
https://doi.org/10.1007/s11042-022-12873-5
Collections
Department of Computer Engineering, Article
Suggestions
OpenMETU
Core
Perceptual Soundfield Reconstruction in Three Dimensions via Sound Field Extrapolation
Erdem, Ege; De Sena, Enzo; Hacıhabiboğlu, Hüseyin; Cvetkovic, Zoran (2019-05-01)
Perceptual sound field reconstruction (PSR) is a spatial audio recording and reproduction method based on the application of stereophonic panning laws in microphone array design. PSR allows rendering a perceptually veridical and stable auditory perspective in the horizontal plane of the listener, and involves recording using nearcoincident microphone arrays. This paper extends the PSR concept to three dimensions using sound field extrapolation carried out in the spherical-harmonic domain. Sound field render...
3D perceptual soundfield reconstruction via sound field extrapolation
Erdem, Eg; Hacıhabiboğlui Hüseyin.; Department of Multimedia Informatics (2020)
Perceptual sound field reconstruction (PSR) is a spatial audio recording and reproduction method based on the application of stereophonic panning laws in microphone array design. PSR allows rendering a perceptually veridical and stable auditory perspective in the horizontal plane of the listener, and involves recording using nearcoincident microphone arrays. This thesis extends the two dimensional PSR concept to three dimensions and allows reconstructing an arbitrary sound field based on measurements with a...
Blind source separation and directional audio synthesis for binaural auralization of multiple sound sources using microphone array recordings
Günel Kılıç, Banu; Hacıhabiboğlu, Hüseyin (2008-01-01)
Microphone array signal processing techniques are extensively used for sound source localisation, acoustical characterisation and sound source separation, which are related to audio analysis. However, the use of microphone arrays for auralisation, which is generally related to synthesis, has been limited so far. This paper proposes a method for binaural auralisation of multiple sound sources based on blind source separation (BSS) and binaural audio synthesis. A BSS algorithm is introduced that exploits the ...
Downlink transmission techniques for multi user multi input multi output wireless communications
Coşkun, Adem; Candan, Çağatay; Department of Electrical and Electronics Engineering (2007)
Multi-user MIMO (MIMO-MU) communication techniques make use of available channel state information at the transmitter to mitigate the inter-user interference. The goal of these techniques is to provide the least interference at the mobile stations by applying a precoding operation. In this thesis a comparison of available techniques in the literature such as Channel Decomposition, SINR Balancing, Joint-MMSE optimization is presented. Novel techniques for the MIMO multi-user downlink communication systems, w...
Cluster based user scheduling schemes to exploit multiuser diversity in wireless broadcast channels
Soydan, Yusuf; Candan, Çağatay; Department of Electrical and Electronics Engineering (2008)
Diversity methods are used to improve the reliability of the communication between transmitter and receiver. These methods use redundancy to reduce the errors in the communication link. Apart from the conventional diversity methods, multiuser diversity has an aim of maximizing the sum capacity of a multi-user system. To benefit from multiuser diversity, the opportunistic scheduling method grants the channel access to the user which has the best channel quality among all users. Therefore, the cumulative sum ...
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
S. E. Kucukbay, A. Yazıcı, and S. Kalkan, “Hand-crafted versus learned representations for audio event detection,”
MULTIMEDIA TOOLS AND APPLICATIONS
, pp. 0–0, 2022, Accessed: 00, 2022. [Online]. Available: https://hdl.handle.net/11511/97158.