next up previous
Next: About this document Up: Thrust Area II: Previous: B.2 Detection and

C. Auditory Scenes and Dynamic Binding

The effective exploitation of auditory pre-processors and related signal processing techniques for DoD applications such as, mechanical fault classification and analysis, detection and classification of underwater transients etc., requires tools to cope with complex auditory scenes with multiple (non-stationary, moving, randomly modulated) sources. Further, the significance of the perceived sound fields generated by these sources may lie in appropriate higher-level interpretations (e.g. those based on causal relationships). For instance, if the sound sources in question correspond to several sets of bearings and shafts in a piece of mechanical machinery (e.g. helicopter gear-box), features of the sound patterns corresponding to bearing wear may have direct causal relationships (i.e. links) with those features of the sound patterns that reflect initiation of shaft vibrations (and hence possible impending mechanical failure). Thus in addition to extracting/separating the sources and localizing them, one needs to be able to construct representations of higher level composite entities such as `Fault-type A', `Wear-stage B', etc., and of the (causal and other) relationships/links between them. The set of relevant composite entities and links are not known a priori, may be transient, and hence require dynamic updating. We propose to work on the ``dynamic binding problem'', - the association of features of sound patterns to sources/ objects and the representation of bindings or links between features, between features and sources, and between sources. Any solution to this problem must meet certain requirements, partly based on biological plausibility: (a) Neural representations allocated to such entities have to arise in a compositional manner out of the activity patterns of the neural units themselves and not require architectures based on specialized detectors for each feature and each relationship (since, considering the possible infinite variety of inter-relationships that say spectral features and sources/objects can have in any set of data, dedicated detectors would mean a combinatorial explosion); (b) Bindings have to be relational (i.e. qualified in terms of domain-specific relations among items), be reversible/breakable, and be given in terms of learnable compositional representations [.GEMAN BIENENSTOCK 1994.].

Our technical approach to the binding problem is based on a proposal of Von der Malsburg [.MALSBURG 1986.] that fine temporal structure of neural activity be viewed as the medium for expressing dynamic binding. This can be taken further in two directions: (a) Represent binding via groups of neurons firing periodically (thus one can encode via both level of firing and phase of firing); (b) Represent binding via wave-like activity patterns in large networks called Synfire Chains (a technically complex idea due to Abeles [.abeles 1982.]). The first approach has seen applications to visual scenes [. Hummel Biederman 1992.], and natural language query processing [.SHASTRI AJJANAGADDE 1993.].

Recent discoveries in thalamo-cortical oscillations [.ENGEL 1992.] have also lent credence to the idea that oscillatory neural networks are playing an important part in the solution to the dynamic binding problem, where coherent oscillations encode the binding together of features of an image in a receptor field. While the most prominent among these studies concentrate on visual perception, it is reasonable to expect that similar mechanisms apply to spatio-temporal representations of auditory data [.GRAY 1994.].

We propose to exploit such ideas in auditory pre-processing, developing explicit models and algorithms based on the theory of coupled nonlinear oscillators. In addition to the fundamental mathematical aspects and design of such nonlinear dynamical pre-processors with internal degrees of freedom, we plan to investigate analog CMOS implementation of oscillator networks with adaptive weights. We plan to build on our prior work along these lines (collaboration with NRL), [.JUSTH 1996.], that we describe briefly below.

We have proposed a class of networks inspired by previous work of Zemel, Mozer and Williams [.ZEMEL 1995.]. Specifically these networks take the form

where

and is an activation function of suitable form. The units are complex number valued, thus allowing us to encode binding via phase angle coherence. The weights are also complex with hermitian symmetry and vanishing diagonals (no self-loops). The underlying theory is based on a unit-complex valued random nodes and when one allows the phase to satisfy the von-Mises distribution,

then the activation function for the ``mean-field theory'' of the network is given in terms of a ratio of Bessel functions, and the corresponding mean-field entropy is

Combining this entropy with the mean-field energy

we get the free energy , where .

In our work [.JUSTH 1995.], and [.JUSTH 1996.] using these ideas and a rigorous analysis of the free energy as a Lyapunov function, a careful proof of convergence has been obtained. Further, the network and convergence proof have been generalized in several ways, to make the network more applicable to actual engineering problems. We have argued that such coupled oscillator circuits are more natural to implement in analog hardware than other types of dynamical equations because the signal levels tend to remain at sufficiently large values that effects of offsets and mismatch are minimized. We have shown: (i) how a pair of coupled oscillators can be used to compensate for the feedback path phase shift in a complex LMS loop (with potential applications for analog adaptive antenna arrays or linear predictor circuits); (ii) how a single oscillator circuit with feedback could be used for continuous wavelet transform applications.

We propose to take these developments further into a careful investigation of the solution to dynamic binding of auditory scenes via both coupled oscillators and Synfire chains as in the work of Abeles and that of Bienenstock. We propose to enhance the type of oscillator networks we study (in terms of richness in the unit responses and connectivity patterns), and we propose to test our ideas via computational examples and on data from DoD applications.



next up previous
Next: About this document Up: Thrust Area II: Previous: B.2 Detection and



Didier A. Depireux
Mon May 19 16:39:55 EDT 1997