The human auditory system organizes mixtures of sounds into streams using grouping cues such as harmonicity and common modulations. This bottom-up processing contributes to the recognition of sounds in noise. The existing computational models of the phenomenon are very limited.
Goal: The project will develop a computational model of primary grouping of sounds based on modern machine learning methods such as sparse coding and non-negative matrix factorization, which have been successfully used to stream complex mixtures sounds.
Relevance: The developed model provides a mid-level representation that can be used as a basis of higher-level models to predict speech intelligibility in environmental noises. The representation itself can be used to predict separability of different types of speech and noise.
Main host institution: Tampere University of Technology
Second host institution: Katholieke Universiteit Leuven
Industry partner: Nokia Oyj