Computational Sensorimotor Systems Lab:


Jonathan Z. Simon: Research

 

Auditory Neural Computations and Time

Can the brain be thought of as a kind of computer? While this might be a topic of debate, few would deny that the brain does perform computations. The subject of my research program is to identify and describe, quantitatively and algorithmically, such neural computations—specifically those performed in the brain’s auditory system. This research sheds light on the function of the brain in general, and of the auditory system in particular, and also permits us to discover new algorithms otherwise unknown to engineering or science.

Such neural computations employ algorithms developed and fine-tuned by millions of years of evolution. As such the computations are typically far beyond the capability of even the most advanced computers. But by identifying, understanding, and quantitatively characterizing the computations performed by the brain, it is possible to determine those algorithms. This computational level of understanding has great potential benefits to engineering applications (e.g. auditory-based identification methods, robust speaker identification, robust speech processing) as well as to health-related applications (hearing aids and cochlear implants that would actually function well in a noisy environment).

The class of neural computations that use the temporal character of the sounds being processed—those for which time plays an important role—are the primary focus of my research. As is described below, this includes the neural computations employed in the processing of rhythmic sounds, e.g., speech or simple repeating patterns, and in the disentanglement of an individual rhythmic sound from other competing sounds.

The experimental device employed in the majority of my research is the neuroimaging tool magnetoencephalography (MEG). Unlike the more commonly known neuroimaging tool fMRI, MEG has better temporal resolution (milliseconds instead of seconds) and is silent (a boon for investigating the auditory system), with a relatively mild tradeoff in terms of spatial resolution. The fast temporal resolution of MEG is especially valuable for investigations of the neural processing of rhythmic sounds and their temporal dynamics. MEG is primarily sensitive to neural activity in human cortex.

My primary research topics address the question of how the brain turns sound into hearing (surprisingly, the objective sounds impinging upon our ears are not very tightly linked to what we hear). Four of my research areas are described here: (1) investigations of how the brain solves the “Cocktail Party” problem, i.e., how, in a crowded and noisy environment, we have the ability to hone in on a single auditory source (e.g. one person talking), while simultaneously suppressing all the remaining interfering sounds, (2) how the brain represents complex sounds such as human speech, (3) how the brain’s representations of complex sounds are built up from representations of much simpler building blocks (acoustic modulations), and (4) advances in neural signal processing.

The “Cocktail Party” Problem and Auditory Stream Segregation

Although it seems quite natural to be able to pick out and listen to a single voice among a crowded room of people, it is actually such a challenging task that there are no known computer algorithms that can accomplish it. How our brain manages to identify and selectively listen to a single voice among many (and how other animals’ brains accomplish similar tasks) is known as the “Cocktail Party” problem. I investigate the neuroscience underlying our ability to “un-mix” sounds in this manner.

In our experiments, subjects are presented with a complex auditory scene (e.g. two people talking at the same time, one person talking in a very noisy background, or a simple rhythmic sequence competing with other sounds), but the subjects listen to only one sound in the complex mixture. While they listen, neural activity from their entire brain is recorded. We have demonstrated that the brain encodes sounds by rapidly increasing and decreasing the level of neural activity, many times each second, in response to the sounds’ rhythms. Some parts of the brain encode the entire sound mixture, always responding the same way regardless of which sound in the complex scene the subject is actively listening to. Other parts of the brain, however, only encode the single sound being attended to, responding to the rhythm of that sound alone and ignoring all other sounds. The general outcome of these experiments does not depend on whether the subjects are listening to one speaker competing with another (Ding & Simon, PNAS, 2012; Zion Golumbic et al., Neuron, 2013), a simple rhythmic sequence competing with another sequence (Xiang et al., J Neurosci, 2010), or a simple rhythmic sequence competing with a noisy background (Elhilali, Xiang et al., PLOS Biology, 2009).

Critically, in all these examples, it is the temporal locking of the neural signal to the temporal rhythm of the sound that is seen (whether the rhythm only of the attended sound or of the entire sound mixture). This locking is due to the strong temporal nature of the computations being performed when one sound is separated out, neurally, from the rest. In fact, in the case of two people talking at the same time, by inspecting the neural signals we can determine with high accuracy which of the two the subject is listening to. In this sense, we are tapping into the part of the brain that represents what the subject hears, rather than the sound that enters their ears.

Neural Computations in Speech Processing by the Brain

My research also examines the auditory aspects of how the brain processes and encodes speech.  The focus on auditory, rather than linguistic, aspects allows our findings to predict other animals’ neural processing of arbitrary complex sounds (beyond just human processing of speech). As in the “cocktail party” research described above, the temporal dynamics of the speech, and of the neural computations necessary to process it, are important beyond the processing of the speech sound itself. We investigated the fundamental temporal processing of speech by itself, however, first (Ding & Simon, J Neurophysiol, 2012), before investigating its processing when in the presence of competing speakers (above), or in the presence of noise (Ding & Simon, J Neurosci, 2013).

In all these cases, not only can we use the temporal rhythm of the speech to predict the temporal profile of the neural responses (at rates of many changes per second), but we can even use the temporal rhythm of the neural responses to calculate the acoustic envelope of the speech being listened to. Strikingly, when the speech is in a noisy background, we are better at predicting the pure speech itself than at predicting the (actual acoustic stimulus of the) noisy speech. This suggests that it is really the cleaned (“de-noised”) speech being represented in this (higher level) part of the brain, rather then the acoustic stimulus, and that again we are tapping into the part of the brain that represents what we hear, rather than representing the sounds that entered our ears.

Neural Computations in Acoustic Modulation Processing by the Brain

Speech, while rhythmic, is very dynamic compared to simple repeating rhythms. Using principles first set forward by the French mathematician Fourier, however, it can be demonstrated that simple acoustic rhythmic modulations can be viewed as the building blocks of speech. For this reason, we have also investigated the neural computations involved in the temporal processing of simple acoustic rhythmic modulations (Wang, Ding et al., J Neurophysiol, 2012) and, more critically, how the neural processing of these simple modulations is altered when more and more modulations are added (Xiang et al., JASA Expr Lett, 2013; Ding et al., J Neurophysiol, 2009; Luo et al., J Neurophysiol, 2007), i.e., as the sounds become more and more speech-like.

Novel Signal Processing Techniques

A fourth research area crosses from basic science into engineering applications: signal processing techniques that enhance the neural signal in the (sometimes quite noisy) neural MEG recordings. Through the development of new techniques we have made tremendous advances in reducing noise from three very different origins: from external magnetic and vibrational sources, from the internal electronics of our sensors, and from the rest of the brain, e.g. the neural signals from the bulk of the brain not actively involved in the specific task being investigated. Efficient algorithms  for these techniques were put forward in a trilogy of papers (de Cheveigné and Simon, J Neurosci Methods, 2007; 2008a; 2008b).

Earlier Research in Theoretical Physics

Before my career in neuroscience research, I was an active researcher in theoretical physics. My physics training and background has provided me with a powerful set of mathematical tools, and with the knowledge of when (and when not) to apply them. In a few cases, my physics background has even been directly beneficial to my neuroscience research (Zhuo et al., NeuroImage, 2012; Aytekin et al., Neural Computation, 2007).

 

Web design by Maggie Antonsen

Jonathan Simon

Photo by Emerald  Brooks, courtesy of  NACS