Perception research suggests that speech processing involves competition between multiple, intermediate representations. In realistic conditions these can contain conflicting information.
Goal: Rather than decoding a single spectro-temporal speech representation in a fixed manner into ever larger units (sub-phone units, phones, words), we aim to develop a computational model that can use multiple (e.g., articulatory, short-term and long-term spectro-temporal) signal representations for automatically learning and exploiting the redundancy of the speech.
Relevance: Comparing recognition errors of the model with those of humans for speech that differs in degree of spontaneity and environmental noise (or for synthetic stimuli) will provide valuable insights in how humans deal with variations induced by realistic adverse conditions.
Main host institution: Radboud University Nijmegen
Second host institution: Technical University of Denmark
Industry partner: Nuance Communication International bvba