The two-streams hypothesis is a model of the neural processing of vision as well as hearing.[1] The hypothesis, given its initial characterisation in a paper by David Milner and Melvyn A. Goodale in 1992, argues that humans possess two distinct visual systems.[2] Recently there seems to be evidence of two distinct auditory systems as well. As visual information exits the occipital lobe, and as sound leaves the phonological network, it follows two main pathways, or "streams". The ventral stream (also known as the "what pathway") leads to the temporal lobe, which is involved with object and visual identification and recognition. The dorsal stream (or, "how pathway") leads to the parietal lobe, which is involved with processing the object's spatial location relative to the viewer and with speech repetition.
Several researchers had proposed similar ideas previously. The authors themselves credit the inspiration of work on blindsight by Weiskrantz, and previous neuroscientific vision research. Schneider first proposed the existence of two visual systems for localisation and identification in 1969.[3] Ingle described two independent visual systems in frogs in 1973.[4] Ettlinger reviewed the existing neuropsychological evidence of a distinction in 1990.[5] Moreover, Trevarthen had offered an account of two separate mechanisms of vision in monkeys back in 1968.[6]
In 1982, Ungerleider and Mishkin distinguished the dorsal and ventral streams, as processing spatial and visual features respectively, from their lesion studies of monkeys – proposing the original where vs what distinction.[7] Though this framework was superseded by that of Milner & Goodale, it remains influential.[8]
One hugely influential source of information that has informed the model has been experimental work exploring the extant abilities of visual agnosic patient D.F. The first, and most influential report, came from Goodale and colleagues in 1991[9] and work is still being published on her two decades later. This has been the focus of some criticism of the model due to the perceived over-reliance on findings from a single case.
Goodale and Milner amassed an array of anatomical, neuropsychological, electrophysiological, and behavioural evidence for their model. According to their data, the ventral 'perceptual' stream computes a detailed map of the world from visual input, which can then be used for cognitive operations, and the dorsal 'action' stream transforms incoming visual information to the requisite egocentric (head-centered) coordinate system for skilled motor planning.The model also posits that visual perception encodes spatial properties of objects, such as size and location, relative to other objects in the visual field; in other words, it utilizes relative metrics and scene-based frames of reference. Visual action planning and coordination, on the other hand, uses absolute metrics determined via egocentric frames of reference, computing the actual properties of objects relative to the observer. Thus, grasping movements directed towards objects embedded in size-contrast-ambiguous scenes have been shown to escape the effects of these illusions, as different frames of references and metrics are involved in the perception of the illusion versus the execution of the grasping act.[10] Norman[11] proposed a similar dual-process model of vision, and described eight main differences between the two systems consistent with other two-system models.
Factor | Ventral system (what) | Dorsal system (how) | ||
---|---|---|---|---|
scope=row | Function | Recognition/identification | Visually guided behaviour | |
scope=row | Sensitivity | High spatial frequencies - details | High temporal frequencies - motion | |
scope=row | Memory | Long-term stored representations | Only very short-term storage | |
scope=row | Speed | Relatively slow | Relatively fast | |
scope=row | Consciousness | Typically high | Typically low | |
scope=row | Frame of reference | Allocentric or object-centered | Egocentric or viewer-centered | |
scope=row | Visual input | Mainly foveal or parafoveal | Across retina | |
scope=row | Monocular vision | Generally reasonably small effects | Often large effects e.g. motion parallax |
The dorsal stream is proposed to be involved in the guidance of actions and recognizing where objects are in space. The dorsal stream projects from the primary visual cortex to the posterior parietal cortex. It was initially termed the "where" pathway since it was thought that the dorsal stream processes information regarding the spatial properties of an object.[12] However, later research conducted on a famous neuropsychological patient, Patient D.F., revealed that the dorsal stream is responsible for processing the visual information needed to construct the representations of objects one wishes to manipulate. Those findings led the nickname of the dorsal stream to be updated to the "how" pathway.[13] [14] The dorsal stream is interconnected with the parallel ventral stream (the "what" stream) which runs downward from V1 into the temporal lobe.
The dorsal stream is involved in spatial awareness and guidance of actions (e.g., reaching). In this it has two distinct functional characteristics—it contains a detailed map of the visual field, and is also good at detecting and analyzing movements.
The dorsal stream commences with purely visual functions in the occipital lobe before gradually transferring to spatial awareness at its termination in the parietal lobe.
The posterior parietal cortex is essential for "the perception and interpretation of spatial relationships, accurate body image, and the learning of tasks involving coordination of the body in space".[15]
It contains individually functioning lobules. The lateral intraparietal sulcus (LIP) contains neurons that produce enhanced activation when attention is moved onto the stimulus or the animal saccades towards a visual stimulus, and the ventral intraparietal sulcus (VIP) where visual and somatosensory information are integrated.
Damage to the posterior parietal cortex causes a number of spatial disorders including:
The ventral stream is associated with object recognition and form representation. Also described as the "what" stream, it has strong connections to the medial temporal lobe (which is associated with long-term memories), the limbic system (which controls emotions), and the dorsal stream (which deals with object locations and motion).
The ventral stream gets its main input from the parvocellular (as opposed to magnocellular) layer of the lateral geniculate nucleus of the thalamus. These neurons project to V1 sublayers 4Cβ, 4A, 3B and 2/3a[16] successively. From there, the ventral pathway goes through V2 and V4 to areas of the inferior temporal lobe: PIT (posterior inferotemporal), CIT (central inferotemporal), and AIT (anterior inferotemporal). Each visual area contains a full representation of visual space. That is, it contains neurons whose receptive fields together represent the entire visual field. Visual information enters the ventral stream through the primary visual cortex and travels through the rest of the areas in sequence.
Moving along the stream from V1 to AIT, receptive fields increase their size, latency, and the complexity of their tuning. For example, recent studies have shown that the V4 area is responsible for color perception in humans, and the V8 (VO1) area is responsible for shape perception, while the VO2 area, which is located between these regions and the parahippocampal cortex, integrates information about the color and shape of stimuli into a holistic image.[17]
All the areas in the ventral stream are influenced by extraretinal factors in addition to the nature of the stimulus in their receptive field. These factors include attention, working memory, and stimulus salience. Thus the ventral stream does not merely provide a description of the elements in the visual world—it also plays a crucial role in judging the significance of these elements.
Damage to the ventral stream can cause inability to recognize faces or interpret facial expression.[18]
Along with the visual ventral pathway being important for visual processing, there is also a ventral auditory pathway emerging from the primary auditory cortex.[19] In this pathway, phonemes are processed posteriorly to syllables and environmental sounds.[20] The information then joins the visual ventral stream at the middle temporal gyrus and temporal pole. Here the auditory objects are converted into audio-visual concepts.[21]
The function of the auditory dorsal pathway is to map the auditory sensory representations onto articulatory motor representations. Hickok & Poeppel claim that the auditory dorsal pathway is necessary because, "learning to speak is essentially a motor learning task. The primary input to this is sensory, speech in particular. So, there must be a neural mechanism that both codes and maintains instances of speech sounds, and can use these sensory traces to guide the tuning of speech gestures so that the sounds are accurately reproduced."[22] In contrast to the ventral stream's auditory processing, information enters from the primary auditory cortex into the posterior superior temporal gyrus and posterior superior temporal sulcus. From there the information moves to the beginning of the dorsal pathway, which is located at the boundary of the temporal and parietal lobes near the Sylvian fissure. The first step of the dorsal pathway begins in the sensorimotor interface, located in the left Sylvian parietal temporal (Spt) (within the Sylvian fissure at the parietal-temporal boundary). The spt is important for perceiving and reproducing sounds. This is evident because its ability to acquire new vocabulary, be disrupted by lesions and auditory feedback on speech production, articulatory decline in late-onset deafness and the non-phonological residue of Wernicke's aphasia; deficient self-monitoring. It is also important for the basic neuronal mechanisms for phonological short-term memory. Without the Spt, language acquisition is impaired. The information then moves onto the articulatory network, which is divided into two separate parts. The articulatory network 1, which processes motor syllable programs, is located in the left posterior inferior temporal gyrus and Brodmann's area 44 (pIFG-BA44). The articulatory network 2 is for motor phoneme programs and is located in the left M1-vBA6.[23]
Conduction aphasia affects a subject's ability to reproduce speech (typically by repetition), though it has no influence on the subject's ability to comprehend spoken language. This shows that conduction aphasia must reflect not an impairment of the auditory ventral pathway but instead of the auditory dorsal pathway. Buchsbaum et al[24] found that conduction aphasia can be the result of damage, particularly lesions, to the Spt (Sylvian parietal temporal). This is shown by the Spt's involvement in acquiring new vocabulary, for while experiments have shown that most conduction aphasiacs can repeat high-frequency, simple words, their ability to repeat low-frequency, complex words is impaired. The Spt is responsible for connecting the motor and auditory systems by making auditory code accessible to the motor cortex. It appears that the motor cortex recreates high-frequency, simple words (like cup) in order to more quickly and efficiently access them, while low-frequency, complex words (like Sylvian parietal temporal) require more active, online regulation by the Spt. This explains why conduction aphasiacs have particular difficulty with low-frequency words which requires a more hands-on process for speech production. "Functionally, conduction aphasia has been characterized as a deficit in the ability to encode phonological information for production," namely because of a disruption in the motor-auditory interface.[25] Conduction aphasia has been more specifically related to damage of the arcuate fasciculus, which is vital for both speech and language comprehension, as the arcuate fasiculus makes up the connection between Broca and Wernicke's areas.[25]
Goodale & Milner's innovation was to shift the perspective from an emphasis on input distinctions, such as object location versus properties, to an emphasis on the functional relevance of vision to behaviour, for perception or for action. Contemporary perspectives however, informed by empirical work over the past two decades, offer a more complex account than a simple separation of function into two-streams.[26] Recent experimental work for instance has challenged these findings, and has suggested that the apparent dissociation between the effects of illusions on perception and action is due to differences in attention, task demands, and other confounds.[27] [28] There are other empirical findings, however, that cannot be so easily dismissed which provide strong support for the idea that skilled actions such as grasping are not affected by pictorial illusions.[29] [30] [31] [32]
Moreover, recent neuropsychological research has questioned the validity of the dissociation of the two streams that has provided the cornerstone of evidence for the model. The dissociation between visual agnosia and optic ataxia has been challenged by several researchers as not as strong as originally portrayed; Hesse and colleagues demonstrated dorsal stream impairments in patient DF;[33] Himmelbach and colleagues reassessed DF's abilities and applied more rigorous statistical analysis demonstrating that the dissociation was not as strong as first thought.[34]
A 2009 review of the accumulated evidence for the model concluded that whilst the spirit of the model has been vindicated the independence of the two streams has been overemphasised.[35] Goodale & Milner themselves have proposed the analogy of tele-assistance, one of the most efficient schemes devised for the remote control of robots working in hostile environments. In this account, the dorsal stream is viewed as a semi-autonomous function that operates under guidance of executive functions which themselves are informed by ventral stream processing.Thus the emerging perspective within neuropsychology and neurophysiology is that, whilst a two-systems framework was a necessary advance to stimulate study of the highly complex and differentiated functions of the two neural pathways; the reality is more likely to involve considerable interaction between vision-for-action and vision-for-perception. Robert McIntosh and Thomas Schenk summarize this position as follows: