EvolveCaptions: Real-Time Collaborative ASR Adaptation for DHH Speakers

University of Michigan
ACM SIGACCESS 2025
Teaser figure

Overview of EvolveCaptions. (1) Hearing users correct live captions of the DHH speaker’s voice. (2) The DHH speaker records targeted phrases generated from the corrected terms. (3) The Whisper ASR model is fine-tuned with the recordings and adapts to the speaker over time.

Abstract

Current ASR systems struggle to reliably recognize the speech of Deaf and Hard of Hearing (DHH) individuals, particularly in real-time communication. Existing personalization methods typically require extensive pre-recorded data and place the burden entirely on DHH users. We present EvolveCaptions, a live ASR adaptation system that supports collaborative, in-the-moment personalization. Hearing participants correct ASR errors during conversation, and the system generates short, phonetically relevant phrases for the DHH speaker to record. These recordings are then used to iteratively fine-tune the ASR model. In a preliminary evaluation, our system reduced word error rate from 0.53 to 0.27 over four adaptation rounds with minimal user effort. This work introduces a low-effort, socially collaborative method for adapting ASR to diverse DHH voices in real-world settings.

Video Presentation

BibTeX

        
        TBD