next up previous
Next: C. Robust Sound Up: Thrust Area III. Previous: A. Physiologically-Based Models

B. Models of Human Identification of Complex Acoustic Patterns in Multisource Environments

In designing systems for the automatic identification and tracking of multiple sound sources (as described earlier in Thrust area I(C)), it is essential as a first step to organize the acoustic environment by isolating and characterizing its multiple targets. Humans perform such a task effortlessly, apparently integrating a wide range of cues and attributes including localization, pitch, and spectral shape, onset and offset times, and spectral shape. Our understanding of this extremely complicated integrative process remains inadequate, partly because of the dearth of psychoacoustical data and partly because of the lack of sophisticated models to describe what is available. The psychoacoustical experiments and models proposed below constitute a methodical approach towards the goal of formulating and successfully applying algorithms for the analysis of complex auditory scenes.

The proposed work is based on a theory of auditory masking that identifies two distinctly different processes that interfere with target sound reception. One process, called ``energetic masking,'' is peripheral in nature and can be modeled as a consequence of overlapping patterns of excitation on the basilar membrane and in the auditory nerve. The second process, called ``informational masking,'' is a consequence of cognitive factors involved in the perceptual segregation of sound sources, the formation of auditory images and the recognition of acoustic patterns [. Yost 1991, Kidd 1994, Kidd Mason 1995.]. To describe this process, our proposed models are unique in that they will combine realistic models of preprocessing in the auditory periphery with decision strategies derived from perceptual studies of auditory organization and masking.

To obtain the adequate psychoacoustical data to develop the models, we plan to measure human performance on non-speech pattern identification tasks in masked conditions using maskers that produce energetic and informational masking. A crucial aspect of the approach to designing machine-based recognition systems is the role played by the peripheral structures in representing the stimuli. Specifically, there are a host of early and central auditory algorithms that we plan to use to model the data (see Thrust area I). Three experimental tasks are planned. In each there is a target and a masker that is either energetic or informational. The sounds may originate from the same or from different spatial locations allowing for evaluation of one means for source segregation. The pattern-based differences between signal and masker provide other bases for segregation. The proposed experiments are:

(i) Using non-speech pattern identification, the task of the listener is to identify members of a set of six target patterns that are presented in highly uncertain conditions. The listener must judge which member of the signal set was presented and which of the seven loudspeakers played the signal.

(ii) The listener is presented with two sounds on each observation, both of which contain a signal pattern as in 1) above. The task of the listener is to judge which pattern is corrupted by frequency or time perturbation.

(iii) The task of the listener is to identify the source of a broadband complex sound from among four having different spectral envelops.

Using these human psychophysical results as a guide, we plan to devise a set of grouping/segregation rules that significantly extend current theories and make much more specific predictions for observer performance in multisource listening environments. These rules will be used, together with techniques evaluated in other sections of this proposal, to design, test and implement in hardware automatic recognition systems that separate multiple sound sources, and identify which member of a set of patterns they produce on each observation.



next up previous
Next: C. Robust Sound Up: Thrust Area III. Previous: A. Physiologically-Based Models



Didier A. Depireux
Mon May 19 16:57:46 EDT 1997