TY - JOUR
T1 - Say what? A dataset for exploring the error patterns that two ASR engines make
AU - Moore, Meredith
AU - Saxon, Michael
AU - Venkateswara, Hemanth
AU - Berisha, Visar
AU - Panchanathan, Sethuraman
N1 - Funding Information: We wish to acknowledge the National Science Foundation (NSF) and their generous support through the NSF Graduate Research Fellowship program, as well as Arizona State University’s Center for Cognitive Ubiquitous Computing. Funding Information: We wish to acknowledge the National Science Foundation (NSF) and their generous support through the NSF Graduate Research Fellowship program, as well as Arizona State University's Center for Cognitive Ubiquitous Computing. Publisher Copyright: Copyright © 2019 ISCA
PY - 2019
Y1 - 2019
N2 - We present a new metadataset which provides insight into where and how two ASR systems make errors on several different speech datasets. By making this data readily available to researchers, we hope to stimulate research in the area of WER estimation models, in order to gain a deeper understanding of how intelligibility is encoded in speech. Using this dataset, we attempt to estimate intelligibility using a state-of-the-art model for speech quality estimation and found that this model did not work to model speech intelligibility. This finding sheds light on the relationship between how speech quality is encoded in acoustic features and how intelligibility is encoded. It shows that we have a lot more to learn in how to effectively model intelligibility. It is our hope that the metadataset we present will stimulate research into creating systems that more effectively model intelligibility.
AB - We present a new metadataset which provides insight into where and how two ASR systems make errors on several different speech datasets. By making this data readily available to researchers, we hope to stimulate research in the area of WER estimation models, in order to gain a deeper understanding of how intelligibility is encoded in speech. Using this dataset, we attempt to estimate intelligibility using a state-of-the-art model for speech quality estimation and found that this model did not work to model speech intelligibility. This finding sheds light on the relationship between how speech quality is encoded in acoustic features and how intelligibility is encoded. It shows that we have a lot more to learn in how to effectively model intelligibility. It is our hope that the metadataset we present will stimulate research into creating systems that more effectively model intelligibility.
KW - Auditory perception
KW - Automatic speech recognition
KW - Error detection
KW - Estimation models
KW - Intelligibility
KW - Quality
UR - http://www.scopus.com/inward/record.url?scp=85074730761&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85074730761&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2019-3096
DO - 10.21437/Interspeech.2019-3096
M3 - Conference article
SN - 2308-457X
VL - 2019-September
SP - 2528
EP - 2532
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019
Y2 - 15 September 2019 through 19 September 2019
ER -