next up previous
Next: E. Animal Behavioral Up: Thrust Area III. Previous: C. Robust Sound

D. Models of Human Perception of Impoverished Signals

One manifestation of the robustness of auditory perception is the remarkable resilience of human speech recognition with spectrally and temporally impoverished signals. We have been carrying out experimental and modeling studies to understand the basis of this ability, and to abstract useful principles that can be applied in acoustic applications with similarly distorted signals. The theoretical focus of this research has been the finding that, in a favorable listening environment where acoustic information is abundant, individual variability is small. However, when speech signals are severely impoverished, listeners differed in their ability to make use of available cues, and improvement in performance with additional cues was a non-linear process [.Van Tassell Trine 1996 .].

To better understand these issues, we propose to compare listener performance with an event-oriented speech recognizer that uses landmarks in the speech signal to focus attention where additional information needs to be extracted from the acoustic signal [.Espy-Wilson 1994.]. In this recognition strategy, acoustic properties for features are extracted in a hierarchical fashion. Specifically, manner features (such as sonorant and consonantal) which relate to the degree of opening/closing in the vocal tract are extracted in an event-oriented manner to first define regions and/or specific landmarks within the speech signal. These landmarks are then used in the extraction of information related to other features (such as the place-of-articulation features). In this type of control strategy, recognition is not frame based or segment based. There are no assumptions that speech is nonoverlapping or juxtaposed. As a result, recognition of completely or partially co-articulated sounds should be possible.

In addition to the above modeling work, we shall carry out parallel testing of a range of human listeners both to validate the recognizer and refine its algorithms, and as an approach to modeling human performance. In this project, analysis of the error patterns from the listening task as well as the overall scores will focus on patterns that are relevant within the task context.



next up previous
Next: E. Animal Behavioral Up: Thrust Area III. Previous: C. Robust Sound



Didier A. Depireux
Mon May 19 16:57:46 EDT 1997