Automatic Speech Emotion Recognition using Auditory Models with Binary Decision Tree and SVM

Date

2014-08-28

Author

Yuncu, Enes
Hacıhabiboğlu, Hüseyin
Bozsahin, Cem

Metadata

Show full item record

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Item Usage Stats

101
views

0
downloads

Affective computing is a term for the design and development of algorithms that enable computers to recognize the emotions of their users and respond in a natural way. Speech, along with facial gestures, is one of the primary modalities with which humans express their emotions. While emotional cues in speech are available to an interlocutor in a dyadic conversation setting, their subjective recognition is far from accurate. This is due to the human auditory system which is primarily non-linear and adaptive. An automatic speech emotion recognition algorithm based on a computational model of the human auditory system is described in this paper. The devised system is tested on three emotional speech datasets. The results of a subjective recognition task is also reported. It is shown that the proposed algorithm provides recognition rates that are comparable to those of human raters.

Subject Keywords

Speech, Speech recognition, Emotion recognition, Filter banks, Databases , Feature extraction, Modulation

URI

https://hdl.handle.net/11511/32503

DOI

https://doi.org/10.1109/icpr.2014.143

Collections

Graduate School of Informatics, Conference / Seminar

Suggestions

OpenMETU
Core

Automated learning rate search using batch-level cross-validation Kabakcı, Duygu; Akbaş, Emre; Department of Computer Engineering (2019) Deep convolutional neural networks are being widely used in computer vision tasks, such as object recognition and detection, image segmentation and face recognition, with a variety of architectures. Deep learning researchers and practitioners have accumulated a significant amount of experience on training a wide variety of architectures on various datasets. However, given a specific network model and a dataset, obtaining the best model (i.e. the model giving the smallest test set error) while keeping the tr...
Wireless speech recognition using fixed point mixed excitation linear prediction (MELP) vocoder Acar, D; Karci, MH; Ilk, HG; Demirekler, Mübeccel (2002-07-19) A bit stream based front-end for wireless speech recognition system that operates on fixed point mixed excitation linear prediction (MELP) vocoder is presented in this paper. Speaker dependent, isolated word recognition accuracies obtained from conventional and bit stream based front-end systems are obtained and their statistical significance is discussed. Feature parameters are extracted from original (wireline) and decoded speech (conventional) and from the quantized spectral information (bit stream) of t...
Speech emotion recognition using auditory models Yüncü, Enes; Çakır, Murat Perit; Department of Cognitive Sciences (2013) With the advent of computational technology, human computer interaction (HCI) has gone beyond simple logical calculations. Affective computing aims to improve human computer interaction in a mental state level allowing computers to adapt their responses according to human needs. As such, affective computing aims to recognize emotions by capturing cues from visual, auditory, tactile and other biometric signals recorded from humans. Emotions play a crucial role in modulating how humans experience and interact...
Data-driven image captioning via salient region discovery Kilickaya, Mert; Akkuş, Burak Kerim; Çakıcı, Ruket; Erdem, Aykut; Erdem, Erkut; İKİZLER CİNBİŞ, NAZLI (Institution of Engineering and Technology (IET), 2017-09-01) n the past few years, automatically generating descriptions for images has attracted a lot of attention in computer vision and natural language processing research. Among the existing approaches, data-driven methods have been proven to be highly effective. These methods compare the given image against a large set of training images to determine a set of relevant images, then generate a description using the associated captions. In this study, the authors propose to integrate an object-based semantic image r...
A comparison on textured motion classification Oztekin, Kaan; Akar, Gözde (2006-01-01) Textured motion - generally known as dynamic or temporal texture analysis, classification, synthesis, segmentation and recognition is popular research areas in several fields such as computer vision, robotics, animation, multimedia databases etc. In the literature, several algorithms are proposed to characterize these textured motions such as stochastic and deterministic algorithms. However, there is no study which compares the performances of these algorithms. In this paper, we carry out a complete compari...

Citation Formats

E. Yuncu, H. Hacıhabiboğlu, and C. Bozsahin, “Automatic Speech Emotion Recognition using Auditory Models with Binary Decision Tree and SVM,” 2014, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/32503.