Abstract
Communication Access Realtime Translation (CART) is a widely used captioning technology among deaf and hard of hearing (DHH) individuals, valued for its high accuracy and ability to convey speaker cues and contextual sounds in real time. However, CART performance can degrade in challenging conditions such as background noise, technical jargon, or rapid speech—reducing caption quality and impacting comprehension. We introduce CARTGPT, a real-time captioning system that enhances CART transcripts by leveraging large language models (LLMs) and automatic speech recognition (ASR) input to detect and correct transcription errors. To inform the design of CARTGPT, we conducted a formative study with 10 professional CART captioners to identify common sources of error and their perspectives on using AI for caption correction. We evaluated CARTGPT on a 39.7-hour speech dataset spanning medical, technical, and conversational domains, observing a 5.6% improvement in word accuracy over standard CART and 17.3% over a state-of-the-art ASR model. In a user study with 16 DHH participants, CARTGPT captions were rated as significantly more comprehensible, particularly in technical scenarios, while maintaining real-time responsiveness. These findings demonstrate the potential of LLM-assisted captioning to improve accessibility and comprehension for DHH users in real-world settings.
BibTeX
TBD