Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
MULTI-CLASS CLASSIFICATION OF VOICE DISORDERS USING DEEP LEARNING
Download
Mehtab_Thesis.pdf
Date
2024-1
Author
Rahman, Mehtab Ur
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
188
views
0
downloads
Cite This
Voice disorders are a widespread issue affecting people of all ages, and accurate di agnosis is crucial for effective treatment. Recent advances in Artificial Intelligence based audio and speech processing have led to a focus on binary and multi-classclassification of voice disorders. However, existing work has mostly focused on the binary (two class) classification of voice disorders. Some researchers have also ex plored multi-class classification, but their results are not promising. The primary objective was to enhance the performance of a machine learning-powered system for voice disorder diagnosis in multi-class classification. his research proposes a two-stage framework for the binary and multi-class classi fication of voice disorders. First, high-level feature embeddings are extracted from spectrograms of voice data using the pre-trained VGGish model. In the second stage, we employ four classifiers: Support Vector Machine (SVM), Logistic Regres sion (LR), Multi-layer Perceptron (MLP), and an Ensemble Classifier (EC). Sepa rate experiments are conducted for males, females and both speakers combined. We evaluated our models on a subset of the Saarbruecken Voice Database (SVD). In bi nary classification, VGGish-SVM achieved the highest accuracy for male speakers, while VGGish-EC performed best for female speakers. In multiclass classification, VGGish-SVM outperformed other models in terms of mean accuracy for both gen ders, but VGGishEC demonstrated the best performance for minority classes. We conducted a comparative analysis against baseline methods, including the mel fre quency cepstral coefficient (MFCC), MFCCglottal features and features extracted with the wav2vec and HuBERT models, where SVM was employed as a classifier. The findings show that our approach outperforms these baselines in all classification tasks except for the binary classification of healthy vs. disordered for female speakers. Additionally, we proposed a framework for the multi-class classification of voice dis orders using OpenL3 embeddings. A pre-trained OpenL3 model is utilized to extract high-level embedding features from the mel spectrogram. Then different classifiers are evaluated after the Neighbourhood Component Analysis (NCA) based feature se lection. Random Forest (RF), Support Vector Machine (SVM) and K-Nearest Neigh bors (KNN) are utilized separately to classify the selected features. The evaluation and comparison are performed on a balanced subset of the SVD. Without any speech enhancement preprocessing, our best model, OpenL3-KNN improves the existing work accuracy by 4.9% and F1 score by 8.7%.
Subject Keywords
Voice Disorder
,
Multi-class Classification
,
OpenL3
,
VGGish
,
Deep Learning
URI
https://hdl.handle.net/11511/108878
Collections
Northern Cyprus Campus, Thesis
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
M. U. Rahman, “MULTI-CLASS CLASSIFICATION OF VOICE DISORDERS USING DEEP LEARNING,” M.S. - Master of Science, Middle East Technical University, 2024.