TY - JOUR
T1 - UncommonVoice
T2 - 21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020
AU - Moore, Meredith
AU - Papreja, Piyush
AU - Saxon, Michael
AU - Berisha, Visar
AU - Panchanathan, Sethuraman
N1 - Funding Information: The authors would like to acknowledge the National Spasmodic Dysphonia Association for their support throughout the development of UncommonVoice, particularly for their effort in recruiting speakers for UncommonVoice. Also a special thank you to the National Science Foundation Graduate Research Fellowship. Publisher Copyright: Copyright © 2020 ISCA
PY - 2020
Y1 - 2020
N2 - To facilitate more accessible spoken language technologies and advance the study of dysphonic speech this paper presents UncommonVoice, a freely-available, crowd-sourced speech corpus consisting of 8.5 hours of speech from 57 individuals, 48 of whom have spasmodic dysphonia. The speech material consists of non-words (prolonged vowels, and the prompt for diadochokinetic rate), sentences (randomly selected from TIMIT prompts and the CAPE-V intelligibility analysis), and spontaneous image descriptions. The data was recorded in a crowdsourced manner using a web-based application. This dataset is a fundamental resource for the development of voice-assistive technologies for individuals with dysphonia as well as the enhancement of the accessibility of voice-based technologies (automatic speech recognition, virtual assistants, etc). Research on articulation differences as well as how best to model and represent dysphonic speech will greatly benefit from a free and publicly available dataset of dysphonic speech. The dataset will be made freely and publicly available at www.uncommonvoice.org. In the following sections, we detail the data collection process as well as provide an initial analysis of the speech corpus.
AB - To facilitate more accessible spoken language technologies and advance the study of dysphonic speech this paper presents UncommonVoice, a freely-available, crowd-sourced speech corpus consisting of 8.5 hours of speech from 57 individuals, 48 of whom have spasmodic dysphonia. The speech material consists of non-words (prolonged vowels, and the prompt for diadochokinetic rate), sentences (randomly selected from TIMIT prompts and the CAPE-V intelligibility analysis), and spontaneous image descriptions. The data was recorded in a crowdsourced manner using a web-based application. This dataset is a fundamental resource for the development of voice-assistive technologies for individuals with dysphonia as well as the enhancement of the accessibility of voice-based technologies (automatic speech recognition, virtual assistants, etc). Research on articulation differences as well as how best to model and represent dysphonic speech will greatly benefit from a free and publicly available dataset of dysphonic speech. The dataset will be made freely and publicly available at www.uncommonvoice.org. In the following sections, we detail the data collection process as well as provide an initial analysis of the speech corpus.
KW - Dataset human-computer interaction
KW - Spasmodic dysphonia
KW - Voice disorder
UR - http://www.scopus.com/inward/record.url?scp=85098128165&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85098128165&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2020-3093
DO - 10.21437/Interspeech.2020-3093
M3 - Conference article
SN - 2308-457X
VL - 2020-October
SP - 2532
EP - 2536
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Y2 - 25 October 2020 through 29 October 2020
ER -