Some of the most informative kinds of listener response distributions in the face of noisy speech tokens are those which point to consistent confusions. Collection of such a corpus will require the screening of a very large number of noisy tokens by many listeners. Our intention is to set up efficient web-based speech perception tests and to generate publicity to encourage many thousands of listeners to participate, along the lines of other public science projects such as Galaxy Zoo.
Goal: The aim is to provide the means to collect a corpus of 1000 clear exemplars of robust listener confusions in several languages.
Relevance: The corpus will serve as the basis for the evaluation of microscopic models throughout the duration of the project, and will provide an ongoing resource for the INSPIRE Challenge.
Status: Completed. A corpus of over 1200 robust confusions has been collected and is available for download. The corpus is described in Garcia Lecumberri et al (2013).
Main host institution: Universidad del Pais Vasco
Second host institution: University of Sheffield