In a multiscale representation, the pattern under consideration is represented repeatedly with different degrees of resolution. For instance, a vowel spectrum can be represented coarsely by its envelope (showing only its formant peaks), or more finely by displaying its harmonic peaks. Similarly, a time waveform can be viewed on different time-scales by dilating its time-axis. Such multiresolution representations are a ubiquitous feature of signal analysis at all levels of the auditory system. For instance, spectral analysis in the earliest auditory stages is performed by cochlear filters that exhibit progressively broader bandwidths at higher frequencies [.Yang Shamma 1992.]. This type of filter-bank provides for both a fast dynamic response to detect transient events, and fine spectral resolution for slower sustained sounds.
Higher central auditory stages analyze further the spectral profile into more elaborate multiscale representations, both with respect to the frequency and time axes [.Shamma rip1 1995, shamma mov1 1996.]. The effective overall transformation of the acoustic signal resembles then a double wavelet transform of the sound signal, that is reminiscent of the double Fourier transform used in cepstral analysis [.Oppenheim Schafer 1976.]. However, cortical and cepstral computations differ fundamentally in that the former is local, whereas the cepstral coefficients are global in nature [.Wang Shamma 1995 cor.]. Understanding the theoretical and practical implications of these differences to the noise-robustness of acoustic analysis systems are fundamental aims of our program.
The objective of the projects proposed here and later in Thrust area II(A) is to explore the way these multiscale representations provide concurrent and complementary views of the spectrum. For example, how are spectro-temporal features due to the vibrating elements of the sound source compared to the shape and composition of its resonant structures and enclosures are encoded in the different scales? And how are they to be separated and extracted prior to their detection, identification, or subsequent utilization by other higher level perceptual processes? Understanding these issues will permit us to employ such multiscale representations as front-ends in numerous applications where complex sound attributes such as timbre and pitch are needed as in machine monitoring and diagnostics.
The multiscale analysis that we shall investigate and employ in the various applications (see Thrust area V) is based on a computational model that is derived from extensive neurophysiological data collected in the auditory cortex [.Shamma rip1 1995, Shamma Wang 1995.]. Preliminary investigations of the model's features and utility have been underway [.Shamma Wang 1995.]. Future enhancements to the multiscale algorithm include: (1) Extending the algorithm to perform the multiscale temporal decomposition of dynamic spectra. This is an important extension for applications in which evolving signals may change rapidly in time such as acoustic signals from faltering machines, acoustic transients, and speech consonants. (2) Incorporating explicit physiological parameters and psychoacoustical detection limits. These parameters allow for more accurate models of human and animal perception, and hence a better understanding of the physiological and perceptual processes underlying the remarkable abilities of the auditory system.