Audio Feature and Classifier Analysis for Efficient Recognition of Environmental Sounds

Okuyucu, Cigdem
Yazıcı, Adnan
Environmental sounds (ES) have different characteristics, such as unstructured nature and typically noiselike and flat spectrums, which make recognition task difficult compared to speech or music sounds. Here, we perform an exhaustive feature and classifier analysis for the recognition of considerably similar ES categories and propose a best representative feature to yield higher recognition accuracy. In the experiments, thirteen (13) ES categories, namely emergency alarm, car horn, gun, explosion, automobile, helicopter, water, wind, rain, applause, crowd, and laughter are detected and tested based on eleven (11) audio features (MPEG-7 family, ZCR, MFCC, and combinations) by using the HMM and SVM classifiers. Extensive experiments have been conducted to demonstrate the effectiveness of these joint features for ES classification. Our experiments show that, the joint feature set ASFCS-H (Audio Spectrum Flatness, Centroid, Spread, and Audio Harmonicity) is the best representative feature set with an average F-measure value of 80.6%.


Content-Based Classification and Segmentation of Mixed-Type Audio by Using MPEG-7 Features
Dogan, Ebru; SERT, MUSTAFA; Yazicit, Adnan (2009-07-25)
This paper describes the development of a generated solution for classification and segmentation of broadcast news audio, A sound stream is segmented by classifying each sub-segment into silence, pure speech, music, environmental sound, speech over music, and speech over environmental sound classes in multiple steps. Support Vector Machines and Hidden Markov Models are employed for classification and these models are trained by using different sets of MPEG-7 features. A series of tests was conducted on hand...
Panoramic recording and reproduction of multichannel audio using a circular microphone array
Hacıhabiboğlu, Hüseyin (2009-10-18)
Multichannel audio reproduction generally suffers from one or both of the following problems: i) the recorded audio has to be artificially manipulated to provide the necessary spatial cues, which reduces the consistency of the reproduced sound field with the actual one, and ii) reproduction is not panoramic, which degrades realism when the listener is not seated in a desired ideal position facing the center channel. A recording method using a circularly symmetric array of differential microphones, and a rep...
Localization Uncertainty in Time-Amplitude Stereophonic Reproduction
De Sena, Enzo; Cvetkovic, Zoran; Hacıhabiboğlu, Hüseyin; Moonen, Marc; van Waterschoot, Toon (Institute of Electrical and Electronics Engineers (IEEE), 2020-01-01)
This article studies the effects of inter-channel time and level differences in stereophonic reproduction on perceived localization uncertainty, which is defined as how difficult it is for a listener to tell where a sound source is located. Towards this end, a computational model of localization uncertainty is proposed first. The model calculates inter-aural time and level difference cues, and compares them to those associated to free-field point-like sources. The comparison is carried out using a particula...
Coteli, Mert Burkay; Hacıhabiboğlu, Hüseyin (2018-09-20)
Acoustic source separation refers to the extraction of individual source signals from microphone array recordings of multiple sources made in multipath environments such as rooms. The most straightforward approach to acoustic source separation involves spatial filtering via beamforming. While beamforming works well for a few sources and under low reverberation, its performance diminishes for a high number of sources and/or high reverberation. An informed acoustic source separation method based on the applic...
Stereophonic rendering of source distance using DWM FDN artificial reverberators
Mate Cid, Saul; Hacıhabiboğlu, Hüseyin; Cvetkovic, Zoran (2010-05-22)
Artificial reverberators are used in audio recording and production to enhance the perception of spaciousness. It is well known that reverberation is a key factor in the perception of the distance of a sound source. The ratio of direct and reverberant energies is one of the most important distance cues. A stereophonic artificial reverberator is proposed that allows panning the perceived distance of a sound source. The proposed reverberator is based on feedback delay network (FDN) reverberators and uses a pe...
Citation Formats
C. Okuyucu, M. SERT, and A. Yazıcı, “Audio Feature and Classifier Analysis for Efficient Recognition of Environmental Sounds,” 2013, Accessed: 00, 2020. [Online]. Available: