Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Phase-aware speech super resolution using U-net architecture with lattice topology
Download
Phase_aware_Speech_Super_Resolution_using_U_net_architecture_with_lattice_topology_2024.pdf
Date
2024-1-25
Author
Cenik, Yalçın
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
203
views
121
downloads
Cite This
Speech super resolution (SSR) is one of the main research areas of audio signal processing. The aim is to enhance the bandwidth of audio signals with low sampling frequencies by estimating the high frequencies. A speech signal with increased bandwidth, along with accurately predicted high frequencies, generally provides the listener with better speech quality. Traditional signal processing methods such as interpolation do not provide satisfactory results to solve this problem. With the introduction of generative models into the speech domain, synthetic speech generation and optimization of the developed models with generative models-based loss functions are one of the most current research topics. Reconstructing both magnitude and phase information together to produce high quality speech sound is very critical for speech synthesis. In the literature, reconstructing speech phase information is one of the main problem. Current methods either ignore phase information or try to estimate it using magnitude information in the network. This thesis proposes a method that uses U-net based and lattice filter network by evaluating both magnitude and phase information together. At the same time, the phase loss function is used to optimize the phase information accurately. By performing upsampling entirely in the frequency domain, the entire spectrum is estimated. This method solves the artifact problem that occurs when upsampling is done over time. The experiments and the results show that the proposed method gives the better results than the state-of-art methods in the evaluation metric ViSQOL and comparable results with the other metric LSD with fewer model parameters.
Subject Keywords
Speech super resolution, U-net with lattice topology, Phase-aware archi tecture
,
Speech super resolution
,
U-net with lattice topology
,
Phase-aware architecture
URI
https://hdl.handle.net/11511/108518
Collections
Graduate School of Natural and Applied Sciences, Thesis
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
Y. Cenik, “Phase-aware speech super resolution using U-net architecture with lattice topology,” M.S. - Master of Science, Middle East Technical University, 2024.