In psychology, visual capture is the dominance of vision over other sense modalities in creating a percept.[1] In this process, the visual senses influence the other parts of the somatosensory system, to result in a perceived environment that is not congruent with the actual stimuli. Through this phenomenon, the visual system is able to disregard what other information a different sensory system is conveying, and provide a logical explanation for whatever output the environment provides. Visual capture allows one to interpret the location of sound as well as the sensation of touch without actually relying on those stimuli but rather creating an output that allows the individual to perceive a coherent environment.
One example of visual capture is known as the ventriloquism effect which refers to the perception of speech sounds as coming from a direction other than their true direction, due to the influence of visual stimuli from an apparent speaker.[2] Thus, when the ventriloquism illusion occurs, the speaker's voice is visually captured at the location of the dummy's moving mouth (rather than the speaker's carefully unmoving mouth).[3]
Another example of visual capture occurs when a sound that would normally be perceived as moving from left to right is heard while a person is viewing a visual stimulus that is moving from right to left; in this case, both sound and stimulus appear to be moving from right to left.[4]
When two sensory stimuli are presented simultaneously, vision is capable of dominating and capturing the other. This occurs as visual cues can distract from other sensations, causing the origin of the stimulus to appear as if it is being produced by the visual cue. Therefore, when an individual is in an environment, and multiple stimuli reach the brain at once, there is a hierarchy that vision will guide the rest of the somatosensory cues to be perceived as though they align with the visual experience, despite where their original source may be. Research has found that the visual and auditory reflexive spatial orienting are controlled through a common underlying neural substrate.[5] Furthermore, studies have shown that vision has an effect in cognitive neuroscience, and provides for a significant effect when visually attended to. This dominance is seen again through a visual-haptic task that vision is capable of making better judgements of an object that physically touching it.[6] It has also been determined, that there are certain amounts of visual capture that occur depending on the task, sometimes allowing the visual system to be entirely dominant, while others provide haptic cues to be prominent.[7]
The thalamus is a section of the brain responsible for relaying sensory and motor signals to the cerebral cortex. As stimuli pass through the thalamus, there are specific regions dedicated to each sense, and therefore is able to sort out the multiple parts of an environment an individual experiences in a given moment. Two of these regions are specific to vision and hearing respectively, which may be responsible for the order in which sensory information is coded and then perceived within the cerebral cortex.
The retina at the back of the eye is what perceives stimuli, allowing them to travel through the occipital tract to the lateral geniculate nucleus (LGN) within the thalamus. The data is then transmitted to the occipital lobe where the orientation and other recognizable factors are processed.
The LGN is located near the medial geniculate nucleus (MGN) which is responsible for organizing auditory stimuli after one hears a specific sound. Because these two systems are closely located to each other, research has shown that this might be where vision is responsible for taking over the perception of an environment and resulting in visual capture. As the multiple senses are organized and the response is sent further into the brain for processing, it is possible that the visual cues were recorded stronger, and therefore everything is perceived in a way that all other senses are a function of this visual cue, resulting in a cohesive experience for the individual, driven by the visual system, therefore fitting the definition of visual capture.
This phenomenon was first demonstrated by Frenchman J. Tastevin in 1937, after studying the tactile Aristotle illusion in 1937. This illusion produces the sensation of touching two objects by crossing one's fingers and then holding a spherical object between them. Visual capture was used to explain how vision could overcome this effect and determine what is actually going on.[8] [9]
Attention was again tied to visual cues during an experiment conducted by Michael Posner in 1980.[10] By indicating visually in which direction a stimulus will appear, response time will improve (decrease) if the correct direction is attended to. (Conversely, if the indicator is misleading, response time increases.) This ability to attend to a specific direction allows for a faster reaction time, despite the participant not physically shifting their visual focus during the pre-stimulus indicator.
The evidence that vision has an impact on reaction time demonstrates that vision has a neurological effect on the attentional process. Thus, it is clear that vision is capable of manipulating the perception an individual has of an environment —— this perceptual manipulation is what Tastevin considered visual capture.
A number of studies have demonstrated the visual capture effect. For example, Alais and Burr (2004) using the ventriloquism effect, found that vision is capable of taking over auditory senses, specifically with well-localized visual stimuli.[2] This means that when the stimuli producing the sound as well as vision are close together, there seems to be a direct relationship formed in the perception of these separate stimuli, that correlate them into the same sensation.[11]
Another example of visual capture comes from Ehrsson, Spense, & Passingham (2004) who used a rubber hand to prove that vision is capable of determining how other senses react. As participants watched a rubber hand be stroked, their hand was also stroked in a similar fashion, allowing the individual to attribute their own sensation to what they were watching rather than what was happening to their own body. Therefore, when the rubber hand was then manipulated, for example hitting it with a hammer, the participant feels an immediate shock and pain as they fear that it is their own hand that is in danger. This serves as evidence that the visual system is capable of not only manipulating where an individual perceives another sense to be coming from, but can actually manipulate how one reacts to an experience as well, given that it is vision that is taking charge.[12] [13]
A study by Remington, Johnston, & Yantis (1992) found that attention is involuntarily drawn away from a given task when a visual stimulus interferes. In this study, participants were presented with four boxes; they were told that an image would precede a letter that they were to memorize. The conditions were either to attend to the same box, a different one, all four, or to focus on the center. However, even though they were told to not attend to a certain box, the participant was consistently drawn to the image before the letter in all cases, resulting in a longer response time in all conditions except for the same. The results prove that there is a consistent need for vision to dominate the other senses, and attention is immediately drawn away by it in a controlled setting.[14]
The research in visual capture does not all work in the favor of vision being constantly dominant, as Shams, Kamitani, & Shimojo in 2000 found that visual illusion can be induced by sound in a controlled environment. When a flash of light is accompanied by a series of auditory beeps, the results show that the participant views the flash to be a series of flashes corresponding with the beeps. Because in this experiment hearing seemed to be the dominant sense, it is clear that there is still plenty to be determined about visual capture, although this like the other studies, proves that there is a connection between these two senses when it comes to integrating the perception of an environment.[15]
An example of visual capture experienced in daily life is the ventriloquism effect.[2] This is when ventriloquists make their speech appear to be coming from their puppet rather than their own mouths. In this situation, visual capture allows the audio stimuli to be controlled by the vision system and produce a congruent experience that the sound is coming from the puppet.Another popular example of visual capture happens while watching a movie in a theater, and the sound appears to be coming from the actors lips. Although this may seem true, the sound is actually coming from the speakers, often spread out across the theater rather than directly behind wherever the character's mouth may be.
There is also a phenomenon known that while crossing a street, an individual can hear the sound of an oncoming car. However, when they look to the left the next car is a few blocks away so it is safe to cross. But when they look to the right, there is a car that is passing them that they did not even notice before. This occurs because the individual attributes the sound of oncoming traffic to the first car because they were unaware of the other, closer car. This therefore is an example of visual capture reassigning the audio cue to the incorrect visual cue, resulting in a mistake that could be far more costly than expected.
A phantom limb is the sensation that an amputated limb is still attached. This can cause pain and distress amongst many amputees, and was thought to be incurable. However, in 1998, Vilayanur S. Ramachandran created a mirror box, which allows for an amputee to place their intact limb on one side of the box, and observe their amputated limb by looking at the mirror image of their actual limb.[16] Through visual capture, the visual system is able to override the somatosensory system and send feedback to the brain that the arm is actually okay and not in any specific pain. This has resulted in numerous solutions to problems that individuals with phantom limb pain were having as they could now train their brain via visual capture that the limb was not actually cramped in the position it was when amputated, but rather free to move around and act as a normal limb.
The McGurk effect is a phenomenon that occurs when the reception of an auditory stimulus is determined by the visual system. For example, when the syllable “ba” is repeated over and over, and one sees an individual saying this, then the individual is perceived to be saying “ba”. However, when the same audio is played over a person saying the word “fa”, the fact that the utterance is completely forgotten, and the person will hear the word “fa”. This is once again because vision is able to dominate the auditory system and produce a response that is guided strictly by vision. Because the auditory system is silenced, visual capture is evident and the visual system is able to reorganize the environmental stimuli to produce a cohesive explanation for what would make the most sense by combining the different stimuli.[17]
Understanding visual capture has the potential to lead to numerous benefits in the future. Beyond solving people's pain in phantom limb syndrome, there are numerous potential applications for visual capture. Already, there have been surround-sound systems built to provide unique listening experiences, that “put you right in the middle of the action”. However, it is more than just having sound come from every direction, but the improvements in visual quality of movies, and where sound and vision can be localized best to provide a coherent movie-going experience.