CARTGPT: Real-Time Correction of CART Captions Using Large Language Models
🏅 Best Paper Nominee, ASSETS 2025
Abstract
Communication Access Realtime Translation (CART) is a widely used captioning technology among deaf and hard of hearing (DHH) individuals, valued for its high accuracy and ability to convey speaker cues and contextual sounds in real time. However, CART performance can degrade in challenging conditions such as background noise, technical jargon, or rapid speech—reducing caption quality and impacting comprehension. We introduce CARTGPT, a real-time captioning system that enhances CART transcripts by leveraging large language models (LLMs) and automatic speech recognition (ASR) input to detect and correct transcription errors. To inform the design of CARTGPT, we conducted a formative study with 10 professional CART captioners to identify common sources of error and their perspectives on using AI for caption correction. We evaluated CARTGPT on a 39.7-hour speech dataset spanning medical, technical, and conversational domains, observing a 5.6% improvement in word accuracy over standard CART and 17.3% over a state-of-the-art ASR model. In a user study with 16 DHH participants, CARTGPT captions were rated as significantly more comprehensible, particularly in technical scenarios, while maintaining real-time responsiveness. These findings demonstrate the potential of LLM-assisted captioning to improve accessibility and comprehension for DHH users in real-world settings.
Poster at ASSETS 2024
BibTeX
@inproceedings{10.1145/3663547.3746326,
author = {Wu, Liang-Yuan and Kleiver, Andrea and Jain, Dhruv},
title = {CARTGPT: Real-Time Correction of CART Captions Using Large Language Models},
year = {2025},
isbn = {9798400706769},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3663547.3746326},
doi = {10.1145/3663547.3746326},
abstract = {Communication Access Realtime Translation (CART) is a widely used captioning technology among deaf and hard of hearing (DHH) individuals, valued for its high accuracy and ability to convey speaker cues and contextual sounds in real time. However, CART performance can degrade in challenging conditions such as background noise, technical jargon, or rapid speech—reducing caption quality and impacting comprehension. We introduce CARTGPT, a real-time captioning system that enhances CART transcripts by leveraging large language models (LLMs) and automatic speech recognition (ASR) input to detect and correct transcription errors. To inform the design of CARTGPT, we conducted a formative study with 10 professional CART captioners to identify common sources of error and their perspectives on using AI for caption correction. We evaluated CARTGPT on a 39.7-hour speech dataset spanning medical, technical, and conversational domains, observing a 5.6\% improvement in word accuracy over standard CART and 17.3\% over a state-of-the-art ASR model. In a user study with 16 DHH participants, CARTGPT captions were rated as significantly more comprehensible, particularly in technical scenarios, while maintaining real-time responsiveness. These findings demonstrate the potential of LLM-assisted captioning to improve accessibility and comprehension for DHH users in real-world settings.},
booktitle = {Proceedings of the 27th International ACM SIGACCESS Conference on Computers and Accessibility},
articleno = {73},
numpages = {11},
keywords = {Accessibility, Deaf and hard of hearing, generative AI, real-time captioning},
location = {
},
series = {ASSETS '25}
}