ENR2 An integrated model of human speech recognition based on sparse representations and exemplar matching

Early Stage Researcher: Deepak Baby

Main host institution: Katholieke Universiteit Leuven

Main Host Supervisor: Hugo Van hamme

Second host institution: Tampere University of Technology

Second Host Supervisor: Tuomas Virtanen

Industry partner: Nuance Communication International bvba

Automatic speech recognition (ASR) enables a computer or an electronic machine to identify spoken words and to perform the corresponding action. With the popularity of electronic gadgets these days, it is quite common to have a built-in ASR system which can automate actions like search, texting, navigation etc. just by talking to the device. ASR has also applications in the military, for people with disabilities, etc..

For an ASR system to work properly, it has to recognize correctly what the user is saying. But even after decades of research, the performance of the ASR systems is still far inferior when compared to that of humans especially in the presence of background noise. Most of these algorithms work reasonably well in unnatural laboratory conditions which is much different from a recording taken from a realistic environment. The goal of this project is to incorporate more sophisticated knowledge of human hearing into the ASR framework so that we can make the system perform more human like and to better match the realistic set-up.

When it comes to recognizing speech or images, the brain tries to recognize a pattern from the uttered speech or the seen visual. For an electronic machine to do ASR, it needs to extract some characteristic properties or features from the noisy speech and then recognize some learned pattern which eventually leads the system to the underlying words. In a noisy environment, what degrades the performance of an ASR system is that the learned feature patterns will be corrupted by the addition of noise which leads to erroneous results. Hence it is found that cleaning or enhancing the corrupted features by removing the artefacts introduced by noise is beneficial for an improved performance.

For enhancing the features, we first need to train the system to differentiate between speech and noise. For this, we extract speech and noise feature patterns and store them as ”exemplars” and then try to represent the feature patterns obtained from the noisy data as the sum of speech and noise exemplars. The part corresponding to speech is then separated out which is fed to the ASR system. The performance of the algorithm thus depends on how well the speech and noise features are differentiated. So in this work, the goal is to make use of the knowledge about human auditory processing for feature extraction so that a better separation of speech from noise can be obtained. The proposed methods will be evaluated using available benchmarks for comparing various ASR systems.

The ultimate outcome of this research is thus to improve the performance of the current ASR systems in noisy environments drawing on knowledge about human auditory processing, which will also eventually turn out to be an integrated computational model for human hearing as more and more ideas are incorporated.

FacebookMySpaceTwitterDiggDeliciousStumbleuponGoogle BookmarksRedditNewsvineTechnoratiLinkedinMixxRSS FeedPinterest
<a href=London public event" />
Join us for a fun event! "Good listeners and smooth talkers: Spoken communica-tion in a challenging world", 7.00pm, Tuesday 20 January, Royal Institution, London
Read more ...
<a href=The Big Listen!" />
Help researchers develop the next generation of hearing aids by taking "The Big Listen", a 5-minute online listening test developed as part of the INSPIRE project.
Read more ...

Log in to INSPIRE

Event calendar

November 2020
Mon Tue Wed Thu Fri Sat Sun


  • We would like to warmly invite you to join our Radboud Summer School course on "Multilingualism in the Wild" Radboud University Nijmegen, The Netherlands Dates: 10-14...

  • The INSPIRE workshop "Computational models of cognitive processes" will take place in Leuven, Belgium, from Wednesday 1 July to Saturday 4 July, 2015. Click here for workshop...

  • The INSPIRE winter school "Talker-listener interactions" will take place in London, England, from Tuesday 20 January to Friday 23 January, 2015. Click here for winter school information.

Go to top