New preprint: The brain separates auditory and visual “meanings” of words

In our new preprint, we tried to figure out whether word meanings are “the same” in the brain, whether we hear a spoken word or lip read the same word by watching a speaker’s face.

To answer this, participants did the same task in two conditions: auditory and visual. In the auditory condition, they heard a speaker say a sentence. In the visual condition, they just saw the speaker say the sentence without sound (lip reading). In both conditions, they then chose from four words which one they had understood in the sentence.

In the auditory condition, the speech was embedded in noise so that participants would misunderstand words in some cases (on average, they heard the correct word in 70% of trials).

In the visual condition, performance was also on average 70% correct. But lip reading skills vary extremely in the population and this is something we also saw in our data: the performance in the lip reading task covered the whole possible range (from chance level to almost 100% correct). Needless to say, our participants were all proficient verbal speakers (mostly college students). Quite some time ago, the idea came up that the variability in lip reading reflects something other than normal speech perceptual abilities. Is it therefore possible that the processing of auditory and visual words is completely different in the brain?

To answer this question, we recorded our participants’ brain activity while they did the comprehension task. We used the magnetoencephalogram (MEG), which detects changes in magnetic fields outside the head that are produced by neural activity.

To analyse the brain’s activity during the perception of auditory and visual words, we used a classification approach: First, we tried to reconstruct which word participants had perceived by comparing their waveform patterns in the brain (stimulus classification, or encoding). Second, we analysed which of the classification patterns we found predicted whether participants actually perceived the correct word.

In a nutshell, two main findings emerged:

  1. Areas that encode the word identity very well (sensory areas) do often not predict comprehension. Looking at it the other way round, the areas that encode the word sub-optimally (higher order language areas) influence what we actually perceive. This is true for auditory and visual speech.


As once pointed out by Hickok & Poeppel, we think that the task we perform is the key that determines the results we get – and which areas are most relevant for our behaviour. In our case, higher-order language areas are most important for comprehension. But if the task was to discriminate speech sounds or lip movements, early sensory areas would probably be more task-relevant.

  1. The representations for auditory and visual word identities are largely distinct. They only overlap in a small area, comprising the left temporal pole and inferior frontal gyrus (green area in figure below). Our results therefore suggest that this small area might hold the a-modal perceived meaning of a word.


Previous studies have often looked at brain activation across the brain using fMRI (functional magnetic resonance imaging) data. Activation means that something is “happening” in a brain area. These studies usually suggest that the processing of acoustic and visual speech overlap to a large extent.

But the nature of these activations can be unclear. We think that the activation of a general language network could explain such findings, without necessarily presenting specific word identities. Moreover, other studies often use categories (for example, buildings vs animals) instead of single word meanings, which could give a different picture.

Overall, our analysis of specific word identities (meanings?) showed that our brain does very different things when we listen to someone speak or when we try to lip read. This could explain why our ability to understand acoustic speech is usually not related to our ability to lip read.

New preprint on speech tracking in auditory and motor cortices

The tracking of temporal information in speech is frequently used to study speech encoding in dynamic brain activity. Often, studies use traditional, generic frequency bands in their analysis (for example delta [1 – 4 Hz] or theta [4 – 8 Hz] bands). However, there are large inter-individual differences in speech rate. For example, audiobooks are typically narrated with 150 words per minute (2.5 Hz), while the world’s fastest speaker can talk at 637 words per minute (10.6 Hz). We therefore reasoned that speech tracking analyses should take into account the specific regularities (e.g. speech rate) of the stimuli. This is exactly what we did in this study: We extracted the time-scales for phrases, words, syllables and phonemes in our sentences and based our analyses on these stimulus-specific bands.

Previous studies also mainly used continuous speech to analyse speech tracking. This is a fantastic, “real-world” paradigm, but it lacks the possibility to directly analyse comprehension. We therefore played single sentences to our participants and asked, after each sentence, to indicate which out of four words they had heard in the sentence. This way, we obtained a single-trial comprehension measure. We also recorded participants’ magnetoencephalography (MEG) and did our analyses on source projections of brain activity.

We show two different speech tracking effects that help participants to comprehend speech and both act concurrently at time-scales within the traditional delta band: First, the left middle temporal cortex (MTG) tracks speech at the word time-scale, which is probably useful for word segmentation and mapping the sound-to-meaning. And second, the left premotor cortex (PM) tracks speech at the phrasal time-scale, likely indicating the use of temporal predictions during speech perception.


Previous studies have shown that the motor system is involved in predicting the timing of upcoming stimuli by using its beta rhythm. We therefore hypothesised that a cross-frequency coupling between beta-power and delta-phase at the phrasal time-scale could drive the effect in the motor system. This is indeed what we found and this was also directly relevant for comprehension.

By using stimulus-specific frequency bands and single-trial comprehension, we show specific functional and perceptually relevant speech tracking processes along the auditory-motor pathway. In particular, we provide new insights regarding the function and relevance of the motor system for speech perception.

If you would like to read the full manuscript, you can find a preprint on bioRxiv here.