The Effect of Speaking Rate on Vowel Variability Based on the Uncontrolled Manifold Approach and Flow-Based Invertible Neural Network Modeling.

Jaekoo Kang

Notes FAQ Contact Us

Back to results

Direct link

ERIC Number: ED670362

Record Type: Non-Journal

Publication Date: 2021

Pages: 128

Abstractor: As Provided

ISBN: 979-8-4604-6680-1

ISSN: N/A

EISSN: N/A

Available Date: 0000-00-00

The Effect of Speaking Rate on Vowel Variability Based on the Uncontrolled Manifold Approach and Flow-Based Invertible Neural Network Modeling

Jaekoo Kang

ProQuest LLC, Ph.D. Dissertation, City University of New York

Variability is intrinsic to human speech production. One approach to understand variability in speech is to decompose it into task-irrelevant ("good") and task-relevant ("bad") parts with respect to speech tasks. Based on the uncontrolled manifold (UCM) approach, this dissertation investigates how vowel token-to-token variability in articulation and acoustics can be decomposed into "good" and "bad" parts and how speaking rate changes the pattern of these two from the Haskins IEEE rate comparison database. Following the review on studies encompassing motor equivalence, coarticulation and speaking rate in the first chapter, the UCM analysis is carried out on the IEEE vowels to test whether speaking rate changes the pattern of variability and its two subparts (i.e., "good" or UCM vs. "bad" or CM) in the second chapter. When the rate accelerates, vowel reduction is observed at both articulation and acoustics as expected. However, the normalized score between UCM and CM does not significantly change as a function of speaking rate, which suggests a possible reconsideration of vowel target specifications as well as the methodological limitations reflecting the difference between speech and limb movement. In the third chapter, a modeling approach using flow-based invertible neural networks (FlowINN) is examined, focusing on how variability in speech can be directly learned from the model and whether it can overcome some of the limitations in the UCM analysis. When trained on the same IEEE vowel data, the inverse prediction of the articulation-acoustics model reveals the task-irrelevant or "good" articulatory variability, while the inverse prediction of the acoustics-category model demonstrates the task-irrelevant or "good" acoustic variability. Furthermore, the learned latent space allows a probabilistic sampling of articulatory and acoustic data points, which is not possible in the UCM analysis. The application of the UCM analysis and FlowINN modeling method is discussed, particularly focusing on how the "good" part of variability in speech can be useful rather than being disregarded as noise. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://www.proquest.com/en-US/products/dissertations/individuals.shtml.]

Descriptors: Vowels, Speech Communication, Language Variation, Articulation (Speech), Acoustics, Databases, Motor Reactions, Psychomotor Skills, Models, Computational Linguistics, Artificial Intelligence

ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site: http://www.proquest.com/en-US/products/dissertations/individuals.shtml

Publication Type: Dissertations/Theses - Doctoral Dissertations

Education Level: N/A

Audience: N/A

Language: English

Sponsor: N/A

Authoring Institution: N/A

Grant or Contract Numbers: N/A

Author Affiliations: N/A