Word-Level Error Analysis in Decoding Systems: From Speech Recognition to Brain-Computer Interfaces

Abstract

Brain-to-text (BTT) systems that decode attempted speech from neural activity have achieved 4.2% word error rate (WER). These systems demonstrate potential for daily use similar to automatic speech recognition (ASR) systems, enabling communication for individuals with profound speech loss. To examine fine-grained error patterns in both BTT and ASR, we introduce a refined alignment algorithm to detect word edits, along with four word-level metrics to assess exact correctness and semantic distance of these edits. Analyzing errors by word frequency reveals a significant performance disparity among frequent, infrequent, and rare words across all models. Although transformer-based architectures and selfsupervised pre-training achieve lower error rates, the gap between frequent and infrequent words remains substantial. Our analysis indicates that misclassifying infrequent words incurs higher semantic costs, suggesting that addressing this wordlevel performance gap could enhance overall system usability across ASR and BTT.

Publication
Interspeech 2025