Phonemic restoration effect is a perceptual phenomenon where under certain conditions, sounds actually missing from a speech signal can be restored by the brain and may appear to be heard. The effect occurs when missing phonemes in an auditory signal are replaced with a noise that would have the physical properties to mask those phonemes, creating an ambiguity. In such ambiguity, the brain tends towards filling in absent phonemes. The effect can be so strong that some listeners may not even notice that there are phonemes missing. This effect is commonly observed in a conversation with heavy background noise, making it difficult to properly hear every phoneme being spoken. Different factors can change the strength of the effect, including how rich the context or linguistic cues are in speech, as well as the listener's state, such as their hearing status or age.
This effect is more important to humans than what was initially thought. Linguists have pointed out that at least the English language has many false starts and extraneous sounds. The phonemic restoration effect is the brain's way of resolving those imperfections in our speech. Without this effect interfering with our language processing, there would be a greater need for much more accurate speech signals and human speech could require much more precision. For experiments, white noise is necessary because it takes the place of these imperfections in speech. One of the most important factors in language is continuity and in turn intelligibility.
The phonemic restoration effect was first documented in a 1970 paper by Richard M. Warren entitled "Perceptual Restoration of Missing Speech Sounds". The purpose of the experiment was to give a reason to why in background of extraneous sounds, masked individual phonemes were still comprehensible.
“The state governors met with their respective legislatures convening in the capital city.”
In his initial experiments, Warren provided the sentence shown and first replaced the first 's' phoneme in legislatures with extraneous noise, in the form of a cough. In a small group of 20 subjects, 19 did not notice a missing phoneme and one person misidentified the missing phoneme. This indicated that in the absence of a phoneme, the brain filled in the missing phoneme, through top-down processing. This was a phenomenon that was somewhat known at the time, but no one was able to pinpoint why it was occurring or had labeled it. He again did the same experiment with the sentence:
'“It was found that the wheel was on the axle.”'
He replaced the 'wh' sound in wheel and the same results were found. All people tested wrote down wheel. Warren later did much research for next several decades on the subject.[1]
Since Warren, much research has been done to test the various aspects of the effect. These aspects include how many phonemes can be removed, what noise is played in replacement of the phoneme, and how different contexts alter the effect.
Neurally, the signs of interrupted or stopped speech can be suppressed in the thalamus and auditory cortex, possibly as a consequence of top-down processing by the auditory system.[2] Key aspects of the speech signal itself are considered to be resolved somewhere in the interface between auditory and language-specific areas (an example is Wernicke's area), in order for the listener to determine what is being said. Normally, the latter is thought to be instantiated at the end stages of the language processing system, but for restorative processes, much remains unknown about whether the same stages are responsible for the ability to actually fill-in the missing phoneme.
Phonemic restoration is one of several phenomena demonstrating that prior, existing knowledge in the brain provides it with tools to attempt a guess at missing information, something in principle similar to an optical illusion. It is believed that humans and other vertebrates have evolved the ability to complete acoustic signals that are critical but communicated under naturally noisy conditions. For humans, while it is not fully known at what point in the processing hierarchy the phonemic restoration effect occurs,[3] evidence points to dynamic restorative processes already occurring with basic modulations of sound set at natural articulation rates.[4] Recent research using direct neurophysiological recordings from human epilepsy patients implanted with electrodes over auditory and language cortex has shown that the lateral superior temporal gyrus (STG; a core part of Wernicke's area) represents the missing sound that listeners perceive.[5] This research also demonstrated that perception-related neural activity in the STG is modulated by left inferior frontal cortex, which contains signals that predict what sound listeners will report hearing up to about 300 milliseconds before the sound is even presented.
People with mild and moderate hearing loss were tested for the effectiveness of phonemic restoration. Those with mild hearing loss performed at the same level of a normal listener. Those with moderate hearing loss had almost no perception and failed to identify the missing phonemes. This research is also dependent on the amount of words the observer is comfortable understanding because of the nature of top-down processing.[6]
For people with cochlear implants, acoustic simulations of the implant indicated the importance of spectral resolution.[3] When the brain is using top-down processing, it uses as much information as it can to make a decision on if the filler signal in the gap belongs to the speech, and with lower resolution, there is less information to make a correct guess. A study[7] with actual cochlear implant users indicated that some implant users can benefit from phonemic restoration, but again they seem to need more speech information (longer duty cycle in this case) to achieve this.
The age effects were studied in children and older adults, to observe if children can benefit from phonemic restoration and if so, at what capacity, and if older adults maintain the restoration capacity in the face of age-related neurophysiological changes.
Children are able to produce results comparable to adults by about the age of 5, however still not doing as well as adults. At such an early age most information is processed through bottom-up processing due to the lack of information to recall from. However, this does mean they are able to use previous knowledge of words to fill in the missing phonemes with much less of their brain developed than adults.[8] [9]
Older adults (older than 65 years) with no or minimal hearing loss show benefit from phonemic restoration. In some conditions restoration effect can be stronger in older adults than in younger adults, even when the overall speech perception scores are lower in older adults. This observation is likely due to strong linguistic and vocabulary skills that are maintained in advanced age.[10] [11]
In children, there was no effect of gender on phonemic restoration.[8]
In adults, instead of completely replacing the phonemes, researchers masked them with tones that are informative(helped the listeners pick the correct phoneme), uninformative(neither helped or hurt the listener select the correct phoneme), or misinformative (hurt the listener in picking the correct phoneme). The results showed that women were much more affected by informative and misinformative cues than men. This evidence suggests that women are influenced by top-down semantic information more than men.[12]
Female as opposed to male listeners were better able to use a delayed informative cue at the end of a long sentence to report an earlier word which was disrupted by noise.[13]
The effect reverses in a reverberation room, which echoes real life more so than the typical quiet rooms used for experimentation. This allows for echoes of the spoken phonemes to act as the replacement noise for the missing phonemes. The additional produced white noise that replaces the phoneme adds its own echo and causes listeners to not perform as well.[14]
Another study by Warren was done to determine the effect of the duration of the replacement phoneme on comprehension. Because the brain processes information optimally at a certain rate, when the gap became approximately the length of the word is when the effect started top breakdown and become ineffective. At this point the effect is no longer effective because the observer is now cognisant of the gap.[15]
Much like the McGurk Effect, when listeners were also able to see the words being spoken, they were much more likely to correctly identify the missing phonemes. Like every sense, the brain will use every piece of information it deems important to make a judgement about what it is perceiving. Using the visual cues of mouth movements, the brain will you both in top-down processing to make a decision about what phoneme is supposed to be heard. Vision is the primary sense for humans and for the most part assists in speech perception the most.[1]
Because languages are distinctly structured, the brain has some sense of what word is to come next in a proper sentence. When listeners were listening to sentences with proper structure with missing phonemes, they performed much better than with a nonsensical sentence without a proper structure. This comes from the predictive nature of the pre-frontal cortex in determining what word should be coming next in order for the sentence to make sense. Top-down processing relies on the surrounding information in a sentence to fill in the missing information. If the sentence does not make sense to the observer then there will be little at the top of the process for the observer to go off of. If a puzzle piece of a familiar picture was missing, it would be very simple for the brain to know what that puzzle piece would look like. If the picture of something that makes no sense to the human brain and has never been seen before, the brain will have much more difficulty understanding what is missing.[16] [17]
Only when the intensity of the noise replacing the phonemes is the same or louder as the surrounding words, does the effect properly work. This effect is made apparent when listeners hear a sentence with gaps replaced by white noise repeat over and over with the white noise volume increasing with each iteration. The sentence becomes more and more clear to the listener as the white noise is louder.[18]
When a word with the segment 's' is removed and replaced by silence and a comparable noise segment were presented dichotically. Simply put, one ear was hearing the full sentence without phoneme excision and the other ear was hearing a sentence with a 's' sound removed. This version of the phonemic restoration effect was particularly strong because the brain was doing much less guess work with the sentence, because the information was given to the observer. Observers reported hearing exactly the same sentence in both ears, regardless of one of their ears missing a phoneme.[19]
The restoration effect is studied mostly in English and Dutch, where the restoration effect seemed similar between the two languages. While no research directly compared the restoration effect further for other languages, it is assumed that this effect is universal for all languages.