The main goal of this project is to find explanations for confusions of consonants, typically made by listeners in difficult listening situations. This is approached from a modelling perspective, i.e., a model will be developed that should predict the average pattern of consonant confusions for normal-hearing and hearing-impaired listeners when presented with non-sense syllables.
Background : This project is concerned with “microscopic” speech intelligibility, which means that the focus is to investigate the fundamental building blocks of speech, i.e., the phonemes ("p", "t", "i", etc.). The phonemes tend to be confused with each other in adverse conditions, often following a specific pattern of confusions. For example, “b” may be heard as “n” when there is a certain amount of background noise. In the approach used here, additional “high-level” information - such as vocabulary and syntactic structure of the language - is intentionally neglected in order to investigate how the acoustic information is decoded by the auditory system, leading to our perception of speech.
Approach: The confusions will be investigated using an auditory signal processing model which considers the same acoustic signal as presented to the human listener. In order to obtain a successful model, much attention has to be paid to the acoustic characteristics considered in the model. In this project, one of the considered characteristics is the slow fluctuations in the level of the signal, which has been shown to be crucial for “macroscopic” speech intelligibility of meaningful sentences. The goal is to investigate to what extent these level fluctuations matter at the microscopic level as well, i.e., for the recognition and confusion of individual phonemes.
Relevance: This research is highly relevant for various applications, such as hearing aid development, automatic speech recognition, and automatic speech production. In other words, one can try to imitate the excellent speech recognition capabilities of the human auditory system and in doing so help recover the information that is lost due to hearing-impairment, thus making speech intelligible again for hearing-impaired individuals. Furthermore, the information obtained can help us find effective ways of making computer applications understand speech commands more accurately and produce more intelligible speech.
Main host institution: Technical University of Denmark
Second host institution: University College London
Industry partner: Phonak AG