visual speech – A/C Keitel

A guest blog post by Nina Suess

In our new preprint, we tried to find out what speech-related information can be extracted from silent lip movements by the human brain and how this information is differentially processed in healthy ageing.

It has already been shown that the visual cortex is able to track the unheard speech envelope that is accompanying the lip movements. But the acoustic signal itself is much richer in detail, showing modulations of the fundamental frequency and the resonant frequencies. Those frequencies are usually seen in the spectrogram and they are crucial for the formation of speech sounds. Recent behavioural evidence describes that modulations of those frequencies (or so-called “spectral fine details”) can also be extracted by the observation of lip movements. This raises the interesting question whether this information is also represented at the level of the visual cortex. Therefore, we aimed to investigate if the human cortex can extract those acoustic spectral fine details just from visual speech and how this changes as a function of age.

To answer this question, we presented participants with muted videos of a person speaking and were told to pay attention to the lip movements. We used intelligibility (forward videos vs. backward videos) to investigate if the human brain is tracking the unheard spectral acoustic modulations of speech, given that only forward speech is intelligible and therefore inducing speech-related processes. We calculated coherence between brain activity, the lip movement signal and the omitted signal (the speech envelope, the fundamental frequency and the resonant frequencies modulated near the lips).

We could identify two main findings:

1) The visual cortex is able to track unheard acoustic information that usually accompanies lip movements

We could replicate the findings from Hauswald et al., (2018) indicating that the visual cortex is able to track the unheard acoustic speech envelope just by observing lip movements. Crucially, we found that the visual cortex (Figure 1A) is also able to track the unheard modulations of resonant frequencies (or formants) and the pitch (or fundamental frequency) linked to intelligible lip movements (Figure 1B). These results show that unheard spectral fine-details (along with the unheard acoustic envelope) are transformed from a mere visual to a phonological representation, which strengthens the idea of the visual cortex as a “supporting” brain region for enhanced auditory speech understanding.

2) Ageing significantly affects the ability to track unheard resonant frequencies

Importantly, only the processing of intelligible unheard resonant frequencies decreases significantly with age in the visual and also in the cingulate cortex (Figure 2A, 2B and 2D). This is not the case for the processing of the unheard speech envelope, the fundamental frequency or the purely visual information carried by the lip movements. This indicates that ageing affects especially the ability to derive spectral dynamics in the frequency range of formants that are modulated near the lips. There is a clear difference between younger participants, who can distinguish very clearly between intelligible and unintelligible speech (Figure 2C), and the older participants, who cannot distinguish between those two conditions anymore.

Figure 2

These results can provide new insights into speech perception under natural conditions. Until now, most of the research has focused on the decline of auditory speech processing abilities in age, but far less attention has been paid to how visual speech contributes to preserved speech understanding, especially under adverse conditions. Our results fit very well to studies that show a decline of spectral processing in age as a unisensory phenomenon, and we add evidence that this declined processing might as well be a multisensory problem coming from both auditory and visual senses.

For questions and comments please email Nina: nina.suess@sbg.ac.at or leave a comment below.

Tag: visual speech

New preprint: The visual cortex extracts spectral fine details from silent speech