Early Stage Researcher: Nemanja Cvijanovic
Main host institution: Philips Research Laboratories Eindhoven
Main host supervisors: Armin Kohlrausch, Patrick Kechichian, Kees Janse
Second host institution: University College London
Second host supervisor: Valerie Hazan
Academic partner: University of Edinburgh
Our population is ageing and elderly people continue to live independently longer. In order to stay connected to friends and family all around the world, elderly people increasingly rely on telecommunication systems for a major part of their social interactions. Furthermore, with the gaining popularity of telemedicine and e-health applications, where the patient may be at home receiving instructions from a physician or giving updates and reporting problems, telecommunication systems also contribute to their safety and well-being. Obviously, ensuring a high level of quality and intelligibility of the transmitted speech signal is essential in these scenarios. In the light of these developments, it is important to analyze and understand the adverse effects of age on communication, especially telecommunication, and develop strategies to address them. Here, age can adversely affect both sides of the communication channel, i.e., age-related degradations in speech perception as well as degradations in speech production may occur. As part of the INSPIRE project, these negative effects need to be examined in realistic acoustic environments, e.g., in the presence of ambient noise.
The goal of this work is to collect and combine knowledge on age-related communication degradations and use it to develop strategies and signal processing algorithms to compensate for them and thus improve the usability of telecommunication systems for the elderly. One well known problem is the deterioration of a speech signal in the presence of background noise on either side of the communication channel, which, in combination with age-related degradations, poses an even greater challenge for the elderly.
One of the methods used in this project to address this problem is multi-modal speech enhancement, where a standard microphone is used in combination with a non-audio sensor to clean up the degraded speech signal, e.g., remove background noise (radio, kitchen appliances, etc.) during a Skype conversation in the living room. Currently, we are investigating the use of an ultrasound sensor, consisting of an ultrasound transmitter and receiver aimed at the speakers face. While the microphone captures the degraded speech, the ultrasound sensor aims to capture articulatory movements occurring during speech production. To accomplish this, an ultrasound beam is emitted from the sensor, and the reflections from the speaker’s mouth are captured by an ultrasound receiver. Once the articulatory movements are captured, a link between the acoustic and articulatory features can be established and used to clean up the noisy speech signal. This strategy can also be employed to enhance barely audible speech of elderly speakers who speak very softly to enable the elderly to communicate in noisy situations where they would normally need to shout and thus reduce fatigue. This method therefore addresses both the speech perception as well as the speech production side.
The expected outcome of the project is a deeper understanding of age-related problems in the use of telecommunication systems and a user-friendly, non-intrusive and robust system to address these problems.