Abstract
Sound recognition enhances safety, social interaction, and situational awareness for deaf and hard of hearing (DHH) individuals. However, existing sound recognition technologies primarily classify sounds into predefined categories (e.g., door opening, speech), which fail to capture the full complexity of real-world auditory scenes (e.g., temporal variations, sound transitions, overlapping sound layers). In this work, we introduce SoundNarratives, a real-time system that generates rich, contextual auditory scene descriptions tailored to DHH users. We began with conducting a formative study with 10 DHH participants to identify nine key auditory scene parameters (e.g., sound class, loudness, emotion, semantic description), and used these insights to guide prompt engineering with a state-of-the-art audio language model. A user study with 10 DHH participants demonstrated a significant preference for SoundNarratives over a baseline model, along with a potential for improved confidence and situational awareness.
BibTeX
TBD