Aligning speech enhancement for improving downstream classification performance

Research output: Contribution to journalConference articlepeer-review

Abstract

Speech-based classification models in the cloud are gaining large-scale adoption. In many applications where post-deployment background noise conditions mismatch those used during model training, fine-tuning the original model on local data would likely improve performance. However, this is not always possible as the local user may not be authorized to modify the cloud-based model or the local user may be unable to share the data and corresponding labels required for fine-tuning. In this paper, we propose a denoiser stored locally on edge devices with an application-specific training scheme. It learns a custom speech enhancement scheme that aligns the local denoiser with the downstream model, without requiring access to the cloud-based weights. We evaluate the denoiser with a common classification task - keyword spotting - and demonstrate using two different architectures that the proposed scheme outperforms common speech enhancement models for different types of background noise.

Original languageEnglish (US)
Pages (from-to)3874-3878
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2023-August
DOIs
StatePublished - 2023
Event24th International Speech Communication Association, Interspeech 2023 - Dublin, Ireland
Duration: Aug 20 2023Aug 24 2023

Keywords

  • capsule network
  • cloud computing
  • data privacy
  • keyword spotting
  • speech enhancement

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Fingerprint

Dive into the research topics of 'Aligning speech enhancement for improving downstream classification performance'. Together they form a unique fingerprint.

Cite this