SoundNarratives: Rich Auditory Scene Descriptions to Support Deaf and Hard of Hearing People
Abstract
Sound recognition enhances safety, social interaction, and situational awareness for deaf and hard of hearing (DHH) individuals. However, existing sound recognition technologies primarily classify sounds into predefined categories (e.g., door opening, speech), which fail to capture the full complexity of real-world auditory scenes (e.g., temporal variations, sound transitions, overlapping sound layers). In this work, we introduce SoundNarratives, a real-time system that generates rich, contextual auditory scene descriptions tailored to DHH users. We began with conducting a formative study with 10 DHH participants to identify nine key auditory scene parameters (e.g., sound class, loudness, emotion, semantic description), and used these insights to guide prompt engineering with a state-of-the-art audio language model. A user study with 10 DHH participants demonstrated a significant preference for SoundNarratives over a baseline model, along with a potential for improved confidence and situational awareness.
Nine Sound Parameters
SoundNarratives describes each auditory scene through nine key perceptual parameters, combining acoustic and semantic cues.
Sound Class
Categorizes the type of sound, such as speech, music, or environmental noise.
Loudness
Represents the perceived intensity of the sound.
Speaker Dynamics
Captures how speakers vary in their manner of speaking and the intentions behind their calls.
Spatial Dynamics
Describes movement and distance of sounds within the environment.
Emotion
Reflects the affective tone of the sounds, such as happy, angry, or sad.
Pace
Measures the speed or tempo of sound events over time.
Prominence
Indicates which sounds stand out or attract attention in the scene.
Pattern
Captures recurring sequences or temporal structures in sound events.
Semantic Descriptions
Provides human-understandable context and meaning for the sounds.
System Overview
SoundNarratives processes each auditory scene with AudioFlamingo to derive nine key sound parameters, which are then summarized by GPT-4 into a concise, human-readable description.
SoundNarratives: A crow caws loudly and repeatedly, with four caws at irregular intervals. A man speaks briefly.
Poster at CHI 2025: GAI and A11y Workshop
BibTeX
@inproceedings{10.1145/3663547.3746341,
author = {Wu, Liang-Yuan and Jain, Dhruv},
title = {SoundNarratives: Rich Auditory Scene Descriptions to Support Deaf and Hard of Hearing People},
year = {2025},
isbn = {9798400706769},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3663547.3746341},
doi = {10.1145/3663547.3746341},
abstract = {Sound recognition enhances safety, social interaction, and situational awareness for deaf and hard of hearing (DHH) individuals. However, existing sound recognition technologies primarily classify sounds into predefined categories (e.g., door opening, speech), which fail to capture the full complexity of real-world auditory scenes (e.g., temporal variations, sound transitions, overlapping sound layers). In this work, we introduce SoundNarratives, a real-time system that generates rich, contextual auditory scene descriptions tailored to DHH users. We began with conducting a formative study with 10 DHH participants to identify nine key auditory scene parameters (e.g., sound class, loudness, emotion, semantic description), and used these insights to guide prompt engineering with a state-of-the-art audio language model. A user study with 10 DHH participants demonstrated a significant preference for SoundNarratives over a baseline model, along with a potential for improved confidence and situational awareness.},
booktitle = {Proceedings of the 27th International ACM SIGACCESS Conference on Computers and Accessibility},
articleno = {68},
numpages = {15},
keywords = {Accessibility, human-AI interaction, sound awareness, deaf and hard of hearing, generative AI, prompt engineering, auditory scene analysis},
location = {
},
series = {ASSETS '25}
}