The objective of the final set of projects concerns engineering robustness in speech technology and intends to use knowledge of human speech processing to improve the performance of automatic speech recognition. The performance of speech recognition in dialogue systems is fragile, showing rapid deterioration in even mild noise or reverberation, and is currently far from human levels of performance in realistic conditions. Work within this theme will also feed back into the construction of more sophisticated intelligibility models, many of which are currently based on approaches to automatic speech recognition that are known to be inadequate.
Project ENR-1 aims to sidestep the reductionist representational bottleneck of current speech recognition systems by providing alternative intermediate representations of speech, and to learn to exploit reliable combinations of acoustic features.
Project ENR-2 explores exemplar-based signal representations which use stored knowledge of speech patterns. This approach helps in reducing the effects of extraneous sources by using top-down processing and a large time context.
Project ENR-3 studies the grouping of sounds in the auditory system by using sparse representation. This bottom-up processing contributes to segregation of speech from noises locally, and provides a sparse mid-level representation that can be used by higher-level system such as the one developed in Project ENR-2.