Speaker and posture classification using instantaneous acoustic features of breath signals

Download

index.pdf

Date

2019

Author

İlerialkan, Atı

Metadata

Show full item record

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Item Usage Stats

172
views

104
downloads

Acoustic features extracted from speech are widely used for problems such as biometric speaker identification or first-person activity detection. However, use of speech data raises concerns about privacy due to the explicit availability of the speech content. In this thesis, we propose a method for speech and posture classification using intra-speech breathing sounds. The acoustical instantaneous side information was extracted from breath instances using the Hilbert-Huang transform. Instantaneous frequency, magnitude, and phase features were extracted using intrinsic mode functions, and different combinations of these were fed into a CNN-RNN network for classification. We also created a publicly available breath dataset, BreathBase, for both our experiments in the thesis and future work. BreathBase contains more than 5000 breath instances detected on the recordings of 20 participants reading pre-prepared random pseudo texts in 5 different postures with 4 different microphones. Using side information acquired from breath sections of speech, 87% speaker classification and 98% posture classification accuracy is obtained among 20 speakers with this method. The proposed method outperformed various other methods such as support vector machines, long-short term memory and combination of k-nearest neighbor and dynamic time warping techniques.

Subject Keywords

Hilbert-Huang transform., speaker recognition, posture recognition, hilbert huang transform, instantaneous frequency

URI

http://etd.lib.metu.edu.tr/upload/12624754/index.pdf
https://hdl.handle.net/11511/45430

Collections

Graduate School of Informatics, Thesis

Suggestions

OpenMETU
Core

ACOUSTIC SOURCE SEPARATION USING RIGID SPHERICAL MICROPHONE ARRAYS VIA SPATIALLY WEIGHTED ORTHOGONAL MATCHING PURSUIT Coteli, Mert Burkay; Hacıhabiboğlu, Hüseyin (2018-09-20) Acoustic source separation refers to the extraction of individual source signals from microphone array recordings of multiple sources made in multipath environments such as rooms. The most straightforward approach to acoustic source separation involves spatial filtering via beamforming. While beamforming works well for a few sources and under low reverberation, its performance diminishes for a high number of sources and/or high reverberation. An informed acoustic source separation method based on the applic...
Sound source localization: Conventional methods and intensity vector direction exploitation Günel Kılıç, Banu; Hacıhabiboğlu, Hüseyin (IGI Global, 2011-01-01) Automatic sound source localization has recently gained interest due to its various applications that range from surveillance to hearing aids, and teleconferencing to human computer interaction. Automatic sound source localization may refer to the process of determining only the direction of a sound source, which is known as the direction-of-arrival estimation, or also its distance in order to obtain its coordinates. Various methods have previously been proposed for this purpose. Many of these methods use t...
Blind source separation and directional audio synthesis for binaural auralization of multiple sound sources using microphone array recordings Günel Kılıç, Banu; Hacıhabiboğlu, Hüseyin (2008-01-01) Microphone array signal processing techniques are extensively used for sound source localisation, acoustical characterisation and sound source separation, which are related to audio analysis. However, the use of microphone arrays for auralisation, which is generally related to synthesis, has been limited so far. This paper proposes a method for binaural auralisation of multiple sound sources based on blind source separation (BSS) and binaural audio synthesis. A BSS algorithm is introduced that exploits the ...
Detection and tracking of dim signals for underwater applications Ermeydan, Esra Şengün; Demirekler, Mübeccel; Department of Electrical and Electronics Engineering (2010) Detection and tracking of signals used in sonar applications in noisy environment is the focus of this thesis. We have concentrated on the low Signal-to-Noise Ratio (SNR) case where the conventional detection methods are not applicable. Furthermore, it is assumed that the duty cycle is relatively low. In the problem that is of concern the carrier frequency, pulse repetition interval (PRI) and the existence of the signal are not known. The unknown character of PRI makes the problem challenging since it means...
SPEECH DETECTION ON BROADCAST AUDIO Zubari, Unal; Ozan, Ezgi Can; Acar, Banu Oskay; Çiloğlu, Tolga; Esen, Ersin; Ates, Tugrul K.; Onur, Duygu Oskay (2010-08-27) Speech boundary detection contributes to performance of speech based applications such as speech recognition and speaker recognition. Speech boundary detector implemented in this study works on broadcast audio as a pre-processor module of a keyword spotter. Speech boundary detection is handled in 3 steps. At first step, audio data is segmented into homogeneous regions in an unsupervised manner. After an ACTIVITY/NON-ACTIVITY decision is made for each region, ACTIVITY regions are classified as Speech/Non-spe...

Citation Formats

A. İlerialkan, “Speaker and posture classification using instantaneous acoustic features of breath signals,” Thesis (M.S.) -- Graduate School of Informatics. Multimedia Informatics., Middle East Technical University, 2019.