Consonant-Vowel Transition Models Based on Deep Learning for Objective Evaluation of Articulation

Vikram C. Mathad, Julie M. Liss, Kathy Chapman, Nancy Scherer, Visar Berisha

Research output: Contribution to journalArticlepeer-review

2 Scopus citations


Spectro-temporal dynamics of consonant-vowel (CV) transition regions are considered to provide robust cues related to articulation. In this work, we propose an objective measure of precise articulation, dubbed the objective articulation measure (OAM), by analyzing the CV transitions segmented around vowel onsets. The OAM is derived based on the posteriors of a convolutional neural network pre-trained to classify between different consonants using CV regions as input. We demonstrate that the OAM is correlated with perceptual measures in a variety of contexts including (a) adult dysarthric speech, (b) the speech of children with cleft lip/palate, and (c) a database of accented English speech from native Mandarin and Spanish speakers.

Original languageEnglish (US)
Pages (from-to)86-95
Number of pages10
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
StatePublished - 2023


  • Articulation precision
  • and second language learning
  • cleft lip and palate
  • consonant-vowel transitions
  • convolution neural networks
  • dysarthria
  • pronunciation scores

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Acoustics and Ultrasonics
  • Computational Mathematics
  • Electrical and Electronic Engineering


Dive into the research topics of 'Consonant-Vowel Transition Models Based on Deep Learning for Objective Evaluation of Articulation'. Together they form a unique fingerprint.

Cite this