Deep learning for speech enhancement : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Albany, New Zealand

Qiu, Yuanhang

dc.contributor.advisor	Ruili, Wang
dc.contributor.author	Qiu, Yuanhang
dc.date.accessioned	2022-01-24T04:42:38Z
dc.date.accessioned	2022-06-30T01:48:19Z
dc.date.available	2022-01-24T04:42:38Z
dc.date.available	2022-06-30T01:48:19Z
dc.date.issued	2022
dc.identifier.uri	http://hdl.handle.net/10179/17243
dc.description.abstract	Speech enhancement, aiming at improving the intelligibility and overall perceptual quality of a contaminated speech signal, is an effective way to improve speech communications. In this thesis, we propose three novel deep learning methods to improve speech enhancement performance. Firstly, we propose an adversarial latent representation learning for latent space exploration of generative adversarial network based speech enhancement. Based on adversarial feature learning, this method employs an extra encoder to learn an inverse mapping from the generated data distribution to the latent space. The encoder establishes an inner connection with the generator and contributes to latent information learning. Secondly, we propose an adversarial multi-task learning with inverse mappings method for effective speech representation. This speech enhancement method focuses on enhancing the generator's capability of speech information capture and representation learning. To implement this method, two extra networks are developed to learn the inverse mappings from the generated distribution to the input data domains. Thirdly, we propose a self-supervised learning based phone-fortified method to improve specific speech characteristics learning for speech enhancement. This method explicitly imports phonetic characteristics into a deep complex convolutional network via a contrastive predictive coding model pre-trained with self-supervised learning. The experimental results demonstrate that the proposed methods outperform previous speech enhancement methods and achieve state-of-the-art performance in terms of speech intelligibility and overall perceptual quality.	en_US
dc.publisher	Massey University	en_US
dc.rights	The Author	en_US
dc.subject	Speech processing systems	en
dc.subject	Machine learning	en
dc.title	Deep learning for speech enhancement : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Albany, New Zealand	en_US
dc.type	Thesis	en_US
thesis.degree.discipline	Computer Science	en_US
thesis.degree.grantor	Massey University	en_US
thesis.degree.level	Doctoral	en_US
thesis.degree.name	Doctor of Philosophy (PhD)	en_US
dc.confidential	Embargo : No	en_US
dc.subject.anzsrc	460212 Speech recognition	en
dc.subject.anzsrc	461103 Deep learning	en

Files in this item

Name:: QiuPhDThesis.pdf
Size:: 6.444Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Theses and Dissertations

Show simple item record