Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Speaker and posture classification using instantaneous acoustic features of breath signals
Download
index.pdf
Date
2019
Author
İlerialkan, Atı
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
144
views
82
downloads
Cite This
Acoustic features extracted from speech are widely used for problems such as biometric speaker identification or first-person activity detection. However, use of speech data raises concerns about privacy due to the explicit availability of the speech content. In this thesis, we propose a method for speech and posture classification using intra-speech breathing sounds. The acoustical instantaneous side information was extracted from breath instances using the Hilbert-Huang transform. Instantaneous frequency, magnitude, and phase features were extracted using intrinsic mode functions, and different combinations of these were fed into a CNN-RNN network for classification. We also created a publicly available breath dataset, BreathBase, for both our experiments in the thesis and future work. BreathBase contains more than 5000 breath instances detected on the recordings of 20 participants reading pre-prepared random pseudo texts in 5 different postures with 4 different microphones. Using side information acquired from breath sections of speech, 87% speaker classification and 98% posture classification accuracy is obtained among 20 speakers with this method. The proposed method outperformed various other methods such as support vector machines, long-short term memory and combination of k-nearest neighbor and dynamic time warping techniques.
Subject Keywords
Hilbert-Huang transform.
,
speaker recognition
,
posture recognition
,
hilbert huang transform
,
instantaneous frequency
URI
http://etd.lib.metu.edu.tr/upload/12624754/index.pdf
https://hdl.handle.net/11511/45430
Collections
Graduate School of Informatics, Thesis
Suggestions
OpenMETU
Core
ACOUSTIC SOURCE SEPARATION USING RIGID SPHERICAL MICROPHONE ARRAYS VIA SPATIALLY WEIGHTED ORTHOGONAL MATCHING PURSUIT
Coteli, Mert Burkay; Hacıhabiboğlu, Hüseyin (2018-09-20)
Acoustic source separation refers to the extraction of individual source signals from microphone array recordings of multiple sources made in multipath environments such as rooms. The most straightforward approach to acoustic source separation involves spatial filtering via beamforming. While beamforming works well for a few sources and under low reverberation, its performance diminishes for a high number of sources and/or high reverberation. An informed acoustic source separation method based on the applic...
Sound source localization: Conventional methods and intensity vector direction exploitation
Günel Kılıç, Banu; Hacıhabiboğlu, Hüseyin (IGI Global, 2011-01-01)
Automatic sound source localization has recently gained interest due to its various applications that range from surveillance to hearing aids, and teleconferencing to human computer interaction. Automatic sound source localization may refer to the process of determining only the direction of a sound source, which is known as the direction-of-arrival estimation, or also its distance in order to obtain its coordinates. Various methods have previously been proposed for this purpose. Many of these methods use t...
Blind source separation and directional audio synthesis for binaural auralization of multiple sound sources using microphone array recordings
Günel Kılıç, Banu; Hacıhabiboğlu, Hüseyin (2008-01-01)
Microphone array signal processing techniques are extensively used for sound source localisation, acoustical characterisation and sound source separation, which are related to audio analysis. However, the use of microphone arrays for auralisation, which is generally related to synthesis, has been limited so far. This paper proposes a method for binaural auralisation of multiple sound sources based on blind source separation (BSS) and binaural audio synthesis. A BSS algorithm is introduced that exploits the ...
Detection and tracking of dim signals for underwater applications
Ermeydan, Esra Şengün; Demirekler, Mübeccel; Department of Electrical and Electronics Engineering (2010)
Detection and tracking of signals used in sonar applications in noisy environment is the focus of this thesis. We have concentrated on the low Signal-to-Noise Ratio (SNR) case where the conventional detection methods are not applicable. Furthermore, it is assumed that the duty cycle is relatively low. In the problem that is of concern the carrier frequency, pulse repetition interval (PRI) and the existence of the signal are not known. The unknown character of PRI makes the problem challenging since it means...
SPEECH DETECTION ON BROADCAST AUDIO
Zubari, Unal; Ozan, Ezgi Can; Acar, Banu Oskay; Çiloğlu, Tolga; Esen, Ersin; Ates, Tugrul K.; Onur, Duygu Oskay (2010-08-27)
Speech boundary detection contributes to performance of speech based applications such as speech recognition and speaker recognition. Speech boundary detector implemented in this study works on broadcast audio as a pre-processor module of a keyword spotter. Speech boundary detection is handled in 3 steps. At first step, audio data is segmented into homogeneous regions in an unsupervised manner. After an ACTIVITY/NON-ACTIVITY decision is made for each region, ACTIVITY regions are classified as Speech/Non-spe...
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
A. İlerialkan, “Speaker and posture classification using instantaneous acoustic features of breath signals,” Thesis (M.S.) -- Graduate School of Informatics. Multimedia Informatics., Middle East Technical University, 2019.