New technology generates synthetic speech using brain activity

Written by Olivia Stevenson (Future Science Group)

Researchers from the University of California San Francisco (UCSF; CA, USA) have developed a brain–machine interface that uses brain activity to control a virtual vocal tract, generating natural-sounding artificial speech. In the future, this could potentially bring back speech in individuals who have lost this ability.
Loss of speaking ability is often present in individuals who have experienced brain injury or neurodegenerative diseases such as Parkinson’s disease. At present, some individuals with speech loss are able to use devices that track eye or facial muscle movements to spell out thoughts. However, these devices can be slow, and are often error-prone. 

In this study, published in Nature, researchers studied five volunteers who were being treated at the UCSF Epilepsy Center (CA, USA). These individuals, with in-tact speech, had temporary electrodes monitoring their brain activity in preparation for neurosurgery. The researchers asked these volunteers to read out hundreds of sentences, and their brain activity was recorded.

The researchers then utilized two ‘neural network’ machine-learning algorithms to convert brain activity into artificial speech. First, the team used a decoder that was able to convert brain activity recorded during speech into movements of the virtual vocal tract. Second, a synthesizer was used, which converted vocal tract movements into a synthetic voice.

You might also like:

Speech generated in this way was of significantly higher quality than speech generated from brain activity without simulating individual’s vocal tracts. When given lists of 25 alternatives, transcribers were able to accurately identify 69% of synthesized words and were able to correctly transcribe 43% of sentences. This study has demonstrated that activity in the speech centers of the brain can control a synthesized imitation of an individual’s voice.

“We still have a ways to go to perfectly mimic spoken language. We’re quite good at synthesizing slower speech sounds like ‘sh’ and ‘z’ as well as maintaining the rhythms and intonations of speech and the speaker’s gender and identity, but some of the more abrupt sounds like ‘b’s and ‘p’s get a bit fuzzy. Still, the levels of accuracy we produced here would be an amazing improvement in real-time communication compared to what’s currently available,” said author Josh Chartier (UCSF).

The researchers are currently investigating the use of higher-density electrode arrays and are developing more advanced machine-learning algorithms. The next stage in testing their virtual vocal tract will be to test it using volunteers with impaired speech. They anticipate that in the future, the device may be able to restore speech to individuals with speech loss and impairment, providing much quicker and more fluent synthetic speech than is currently available.

“People who can’t move their arms and legs have learned to control robotic limbs with their brains. We are hopeful that one day people with speech disabilities will be able to learn to speak again using this brain-controlled artificial vocal tract,” concluded Chartier.

Sources: Anumanchipalli GK, Chartier J, Chang EF. Speech synthesis from neural decoding of spoken sentences. Nature 568, 493–498 (2019);