TY - JOUR
T1 - A preliminary study of voice quality transformation based on modifications to the neutral vocal tract area function
AU - Story, Brad H.
AU - Titze, Ingo R.
N1 - Funding Information: This study was supported by grants NIH R01 DC02532-05 and NIH R01 DC04789-01 from the National Institutes on Deafness and other Communication Disorders (NIDCD). The authors also acknowledge two reviewers whose comments have helped to improve upon an earlier version of this manuscript.
PY - 2002/7
Y1 - 2002/7
N2 - The idea is pursued that voice quality can be partially represented by the underlying shape of a speaker's neutral vocal tract. Using an area function model, which allows direct access to the neutral tract shape, four separate modifications were made to one male speaker's vocal tract. The modifications involve imposing constrictive or expansive effects on the pharyngeal and oral portions of the neutral area function as well as on lip aperture and the epi-laryngeal tube. A single word utterance was first synthesized by superimposing deformation patterns appropriate for the word onto the original neutral tract shape (area function). Then, four additional samples of the word were synthesized using different modified neutral area function each time. The modifications were assessed by comparing F1-F2 formant trajectories of the original utterance with those of the modifications. The formant frequencies were observed to shift within the F1-F2 plane in directions predictable from simple tube acoustics. However, the modified voice qualities did not preserve the shape of the original F1-F2 trajectory. In other words, the modifications did not create a simple linear transformation of formant frequencies even though the "articulatory dynamics" (deformation patterns of the area function) were identical in all cases. These somewhat artificial vocal tract modifications were also compared with formant frequencies extracted from recordings of a speaker attempting to produce the same types of modifications. In general, the speaker's formant trajectories showed some similarities to the synthesized versions. However, the speaker also seemed to grade the "level" of the voice quality that was exerted on the utterance depending on whether the demands of the voice quality were in competition with the linguistic demands of a given phonetic segment. Finally, to demonstrate this type of voice quality modification in a broader context, the same procedures were applied to sentence-level speech and results were again shown as F1-F2 formant trajectories.
AB - The idea is pursued that voice quality can be partially represented by the underlying shape of a speaker's neutral vocal tract. Using an area function model, which allows direct access to the neutral tract shape, four separate modifications were made to one male speaker's vocal tract. The modifications involve imposing constrictive or expansive effects on the pharyngeal and oral portions of the neutral area function as well as on lip aperture and the epi-laryngeal tube. A single word utterance was first synthesized by superimposing deformation patterns appropriate for the word onto the original neutral tract shape (area function). Then, four additional samples of the word were synthesized using different modified neutral area function each time. The modifications were assessed by comparing F1-F2 formant trajectories of the original utterance with those of the modifications. The formant frequencies were observed to shift within the F1-F2 plane in directions predictable from simple tube acoustics. However, the modified voice qualities did not preserve the shape of the original F1-F2 trajectory. In other words, the modifications did not create a simple linear transformation of formant frequencies even though the "articulatory dynamics" (deformation patterns of the area function) were identical in all cases. These somewhat artificial vocal tract modifications were also compared with formant frequencies extracted from recordings of a speaker attempting to produce the same types of modifications. In general, the speaker's formant trajectories showed some similarities to the synthesized versions. However, the speaker also seemed to grade the "level" of the voice quality that was exerted on the utterance depending on whether the demands of the voice quality were in competition with the linguistic demands of a given phonetic segment. Finally, to demonstrate this type of voice quality modification in a broader context, the same procedures were applied to sentence-level speech and results were again shown as F1-F2 formant trajectories.
UR - http://www.scopus.com/inward/record.url?scp=0036656940&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0036656940&partnerID=8YFLogxK
U2 - 10.1006/jpho.2002.0168
DO - 10.1006/jpho.2002.0168
M3 - Article
SN - 0095-4470
VL - 30
SP - 485
EP - 509
JO - Journal of Phonetics
JF - Journal of Phonetics
IS - 3
ER -