New preprint: The visual cortex extracts spectral fine details from silent speech

A guest blog post by Nina Suess

In our new preprint, we tried to find out what speech-related information can be extracted from silent lip movements by the human brain and how this information is differentially processed in healthy ageing.

It has already been shown that the visual cortex is able to track the unheard speech envelope that is accompanying the lip movements. But the acoustic signal itself is much richer in detail, showing modulations of the fundamental frequency and the resonant frequencies. Those frequencies are usually seen in the spectrogram and they are crucial for the formation of speech sounds. Recent behavioural evidence describes that modulations of those frequencies (or so-called “spectral fine details”) can also be extracted by the observation of lip movements. This raises the interesting question whether this information is also represented at the level of the visual cortex. Therefore, we aimed to investigate if the human cortex can extract those acoustic spectral fine details just from visual speech and how this changes as a function of age. 

To answer this question, we presented participants with muted videos of a person speaking and were told to pay attention to the lip movements. We used intelligibility (forward videos vs. backward videos) to investigate if the human brain is tracking the unheard spectral acoustic modulations of speech, given that only forward speech is intelligible and therefore inducing speech-related processes. We calculated coherence between brain activity, the lip movement signal and the omitted signal (the speech envelope, the fundamental frequency and the resonant frequencies modulated near the lips). 

We could identify two main findings:

1) The visual cortex is able to track unheard acoustic information that usually accompanies lip movements

We could replicate the findings from Hauswald et al., (2018) indicating that the visual cortex is able to track the unheard acoustic speech envelope just by observing lip movements. Crucially, we found that the visual cortex (Figure 1A) is also able to track the unheard modulations of resonant frequencies (or formants) and the pitch (or fundamental frequency) linked to intelligible lip movements (Figure 1B). These results show that unheard spectral fine-details (along with the unheard acoustic envelope) are transformed from a mere visual to a phonological representation, which strengthens the idea of the visual cortex as a “supporting” brain region for enhanced auditory speech understanding.

Figure 1

2) Ageing significantly affects the ability to track unheard resonant frequencies

Importantly, only the processing of intelligible unheard resonant frequencies decreases significantly with age in the visual and also in the cingulate cortex (Figure 2A, 2B and 2D). This is not the case for the processing of the unheard speech envelope, the fundamental frequency or the purely visual information carried by the lip movements. This indicates that ageing affects especially the ability to derive spectral dynamics in the frequency range of formants that are modulated near the lips. There is a clear difference between younger participants, who can distinguish very clearly between intelligible and unintelligible speech (Figure 2C), and the older participants, who cannot distinguish between those two conditions anymore.


Figure 2

These results can provide new insights into speech perception under natural conditions. Until now, most of the research has focused on the decline of auditory speech processing abilities in age, but far less attention has been paid to how visual speech contributes to preserved speech understanding, especially under adverse conditions. Our results fit very well to studies that show a decline of spectral processing in age as a unisensory phenomenon, and we add evidence that this declined processing might as well be a multisensory problem coming from both auditory and visual senses.

For questions and comments please email Nina: nina.suess@sbg.ac.at or leave a comment below.

New preprint: The brain separates auditory and visual “meanings” of words

In our new preprint, we tried to figure out whether word meanings are “the same” in the brain, whether we hear a spoken word or lip read the same word by watching a speaker’s face.

To answer this, participants did the same task in two conditions: auditory and visual. In the auditory condition, they heard a speaker say a sentence. In the visual condition, they just saw the speaker say the sentence without sound (lip reading). In both conditions, they then chose from four words which one they had understood in the sentence.

In the auditory condition, the speech was embedded in noise so that participants would misunderstand words in some cases (on average, they heard the correct word in 70% of trials).

In the visual condition, performance was also on average 70% correct. But lip reading skills vary extremely in the population and this is something we also saw in our data: the performance in the lip reading task covered the whole possible range (from chance level to almost 100% correct). Needless to say, our participants were all proficient verbal speakers (mostly college students). Quite some time ago, the idea came up that the variability in lip reading reflects something other than normal speech perceptual abilities. Is it therefore possible that the processing of auditory and visual words is completely different in the brain?

To answer this question, we recorded our participants’ brain activity while they did the comprehension task. We used the magnetoencephalogram (MEG), which detects changes in magnetic fields outside the head that are produced by neural activity.

To analyse the brain’s activity during the perception of auditory and visual words, we used a classification approach: First, we tried to reconstruct which word participants had perceived by comparing their waveform patterns in the brain (stimulus classification, or encoding). Second, we analysed which of the classification patterns we found predicted whether participants actually perceived the correct word.

In a nutshell, two main findings emerged:

  1. Areas that encode the word identity very well (sensory areas) do often not predict comprehension. Looking at it the other way round, the areas that encode the word sub-optimally (higher order language areas) influence what we actually perceive. This is true for auditory and visual speech.

AVdecod_dissemination_blog1

As once pointed out by Hickok & Poeppel, we think that the task we perform is the key that determines the results we get – and which areas are most relevant for our behaviour. In our case, higher-order language areas are most important for comprehension. But if the task was to discriminate speech sounds or lip movements, early sensory areas would probably be more task-relevant.

  1. The representations for auditory and visual word identities are largely distinct. They only overlap in a small area, comprising the left temporal pole and inferior frontal gyrus (green area in figure below). Our results therefore suggest that this small area might hold the a-modal perceived meaning of a word.

AVdecod_dissemination_blog2

Previous studies have often looked at brain activation across the brain using fMRI (functional magnetic resonance imaging) data. Activation means that something is “happening” in a brain area. These studies usually suggest that the processing of acoustic and visual speech overlap to a large extent.

But the nature of these activations can be unclear. We think that the activation of a general language network could explain such findings, without necessarily presenting specific word identities. Moreover, other studies often use categories (for example, buildings vs animals) instead of single word meanings, which could give a different picture.

Overall, our analysis of specific word identities (meanings?) showed that our brain does very different things when we listen to someone speak or when we try to lip read. This could explain why our ability to understand acoustic speech is usually not related to our ability to lip read.

Flicker-driven brain waves and alpha rhythms – revisited

In our recent study we asked how waves that the brain produces by itself – alpha rhythms – relate to waves triggered by viewing a flickering screen. Both can be of similar frequency (~10Hz, or ten cycles per second) and are easily recorded from the scalp with electrodes (EEG). To recap, we did not find much evidence for a strong link between the two types of brain waves (more info).

However, there are many ways to produce a flicker, and virtually any rhythmic change in a stimulus will likely drive a measurable brain wave. In experiments we typically use rhythmic changes in brightness (light on/off) or contrast (imagine flipping a checkerboard with its mirror image). Less frequently we use changes along other dimensions of visual properties (e.g. colour, motion). But even within a given property dimension there is a lot of wiggle room in the choice of stimulation parameters – some experiments employ low intensity/low contrast stimuli (as we did) others use intense, full contrast checkerboards.

We know that the properties of the flicker can have a profound influence on the shape of the brain waves they are driving. In brief, waves can look more and less sinusoidal depending on stimulation. The best way to visualise this is to look at EEG spectra (- basically a ledger of the sinusoidal oscillations that make up the EEG-recorded waves). If you only see a peak at the flicker rate (say 10 Hz) then the brain wave elicited by the stimulation will be nearly sinusoidal. If other peaks show up at 20, 30 Hz and so on (multiples of 10), then chances are high your waveform looks more rugged.

With all that in mind it is worth considering that different waveforms, produced by different flickers while keeping the frequency constant, may influence the results of an experiment. Looking at our data, is there a possibility that we would have arrived at different conclusions with an alternative flicker approach?

In our experiment we used a relatively unique approach of smooth contrast changes. Put briefly, we presented a slightly changed version of the stimulus on each frame (= one picture of a movie) of the stimulation. A more typical approach is to switch a stimulus on and off repeatedly to arrive at a similar presentation rate. Would this type of flicker produce a similar pattern of results?

I was able to check this in data from an older study. This experiment had a similar set up with flickering stimuli presented on the left (rate = 10.6 Hz) and right (14.2 Hz). Crucially, here stimuli were simply switched on & off to produce the flicker (more info can be found in the original paper). Suffice it to say that (N = 14) participants were shown a cue (left/right) telling them to focus their attention towards the left or right stimulus for the rest of the trial (~3sec). As in our recent study, I tested effects of attention on the power of the intrinsic alpha rhythm and flicker-driven brain waves. To do so I used the scripts available on osf.io/apsyf.

alphaFig

Above, we see a very typical pattern in the scalp map: the power of the alpha rhythm lateralises according to the focus of attention. Focusing on the left stimulus reduces alpha power over the right hemisphere and vice versa (- although the effect seems to be relatively weak for the left hemisphere). Looking at the spectra shows higher alpha power in the 8 – 13 Hz alpha frequency band when participants ignored the contralateral stimulus position. Interestingly, we also see that the stimulation is intense enough to produce power peaks visible in the spectra of what we have called ‘ongoing’ power. Note that these peaks do not necessarily carry the alpha suppression effect (10.6 Hz peak, right hemisphere, purple spectra) and may even show a reversed pattern (14.2 Hz peak, left hemisphere, orange spectra).

SSRFig

Also, the results for the stimulus-driven waves look very similar to our first report: A measure of how well the brain repeatedly tracks the flicker on each side shows clear peaks at the stimulation frequencies 10.6 & 14.2 Hz (and a Harmonic). Both responses increase when participants focus on the respective driving stimulus. Scalp maps give an impression of this effect* across recording electrodes.

To cut it short, despite the differences in stimulation, patterns of results of both experiments are very similar. Thus, different stimulation approaches may produce different waveforms while the effects the experimenter intends to measure on the waveforms can remain comparable (at least in our case, for visuo-spatial attention).


Thanks to Matt Davidson for prompting this re-analysis on twitter.

* Note that these maps look different from the ones published here. This highlights the influence of factors such as stimulus intensity, location and frequency on the variability of how attention effects project to the scalp.

Alpha rhythms: Some slow down, some grow stronger

Researchers usually assume alpha brain waves to behave relatively similarly over time. In a new study, led by Chris Benwell and just accepted for publication in NeuroImage, we find that this is not necessarily true for the 1-2 hrs that a typical EEG experiment lasts.

In a re-analysis of data from two previous EEG experiments we saw that alpha underlies two major trends – alpha power increases (an effect previously described) and alpha frequency decreases consistently over time. Both effects maybe related to growing fatigue, the depletion of cognitive resources or a transition from a volitional to a more automatic task performance.

We also found that different sources of alpha rhythms in the brain can show different trends. While some showed both, power increases and frequency decreases, others showed only one if the two trends. Further analysis revealed that both trends do not necessarily depend on each other.


benwellBlog_eff

Figure We usually assume that alpha varies “spontaneously” around a mean and higher alpha leads us to observe a different behaviour. This relationship however could simply be a consequence of both, alpha and behaviour changing “deterministically”, i.e. trending over time. In experiments, we likely observe a “mix” of both sources of variance (also see Benwell et al. 2017 or here).


These findings need to be taken into account when testing for links between brain rhythms (brain state) and behaviour – a link might be reported accidentally just because both, brain state and behaviour change over time – and when attempting to manipulate alpha through stimulation – stimulation frequencies may need to be adapted on the fly, stimulation will need to target a specific alpha generator while leaving others unbothered.

 

Flicker-driven brain waves and alpha rhythms

[17 Feb 2019]

Our manuscript Stimulus-driven brain rhythms within the alpha band: The attentional-modulation conundrum has just been accepted for publication in the Journal of Neuroscience. We show that stimulus-driven and intrinsic brain rhythms in the ~10 Hz range (alpha) can be functionally segregated. Briefly put, while one goes up the other one goes down.

In an experiment, we recorded the brain waves of our participants while they were watching a screen with two stimuli. One, shown on the left, flickered at a rate of 10 Hz and another one, shown on the right, flickered at a rate of 12 Hz. (10 Hz flicker means that the stimulus cycles through a change in appearance or is simply switched on and off 10 times per second.) A very prominent notion has it that this type of visual stimulation is capable of taking possession, or “entrain”, the brain’s intrinsic alpha rhythm. The alpha rhythm can be characterised by its *amplitude* – the difference between peaks and troughs or, bluntly put, how strong it is – and its *phase* – when to expect a peak or trough based on its periodicity. From an entrainment perspective, alpha phase is assumed to lock on and align precisely to the periodicity of the visual stimulation.

Note that alpha itself has been looked into for almost a century and alpha phase has been tied to another exciting idea: How our brain processes visual input could be more akin to a camera than the continuous make-believe of our daily experience. Hereby, alpha works as a pacemaker that cuts the real-world continuity into perceptual samples or frames just like still frames of a movie. In line with this idea, experiments have shown that we seem to be less sensitive to “see” brief stimuli that pop up during one part of the alpha cycle – in the camera analogy, when the shutter is down – and more sensitive during another part, i.e. when the shutter is open.

Now, the *combination* of perceptual sampling and entrainment puts experimenters in a formidable position to study alpha’s role in perception. It allows them to manipulate alpha phase and exactly time the presentation of stimuli accordingly. Being able to entrain alpha (or other rhythms) through rhythmic visual stimulation would thus be a versatile and easy-to-apply tool – but does it really work?

In short, our experiment adds to a line of recent studies that challenge a straightforward alpha entrainment using visual flicker. Our main assumption was this: If it looks like alpha and behaves like alpha, then it should be alpha. *It* refers to the brain waves elicited by watching a 10 Hz flicker. Because the brain response shows up as 10-Hz rhythm in the EEG it does *look* like alpha – especially if you look at it in the frequency domain where it produces a neat 10-Hz spectral peak. “Does it behave like alpha?” we translated into “Does it have the same function?”

One very well documented effect is that alpha power (its strength) shifts according to where we attend to. If we focus our attention somewhere to our left (without actually looking there) then alpha power will go down in our right visual brain – due to the cross-wiring of our visual system from eye to cortex, this is where our left visual world is processed. This alpha decrease works like opening the gates for visual input to venture into further stages of processing. Simultaneously, alpha *increases* in the left visual brain, figuratively closing the gates to unattended, irrelevant sights to our right.

Would a brain response driven by our 10/12 Hz stimulation show a similar effect? If so, that would be strong evidence for a close relationship of spontaneous and stimulus-driven alpha brain waves. Using rhythmic flicker to control alpha experimentally would seem like a readily available manipulation. That was not what we found though. On the contrary – we were able to switch between alpha and the stimulus-driven brain waves using slightly different data analysis approaches. Also, attention had the known suppressive effect on alpha while the corresponding (i.e. same-side) stimulus-driven brain response increased.

These results led us to conclude that we are looking at two concurrent neural phenomena, alpha and flicker-driven brain responses. And each one of them seems to provide us with a different perspective on how attention alters our perception.

Find the specifics and references here.


Note 1: Of course, our results do not rule out alpha entrainment, only that an alpha range stimulus-driven brain wave should not be regarded as sufficient to show alpha entrainment. In the paper (and previous literature) we discuss several, possibly additional conditions that need to be satisfied to give rise to the phenomenon.

Note 2: Data and code to reproduce our results are available here. With minor modifications this code should be applicable to other datasets.


Disclaimer: Views expressed in this digest are mine (CK) and not necessarily shared in all their nuances between the co-authors of the manuscript.