Abstract
Current ASR systems struggle to reliably recognize the speech of Deaf and Hard of Hearing (DHH) individuals, particularly in real-time communication. Existing personalization methods typically require extensive pre-recorded data and place the burden entirely on DHH users. We present EvolveCaptions, a live ASR adaptation system that supports collaborative, in-the-moment personalization. Hearing participants correct ASR errors during conversation, and the system generates short, phonetically relevant phrases for the DHH speaker to record. These recordings are then used to iteratively fine-tune the ASR model. In a preliminary evaluation, our system reduced word error rate from 0.53 to 0.27 over four adaptation rounds with minimal user effort. This work introduces a low-effort, socially collaborative method for adapting ASR to diverse DHH voices in real-world settings.
Video Presentation
BibTeX
TBD