Depth perception is the ability to perceive distance to objects in the world using the visual system and visual perception. It is a major factor in perceiving the world in three dimensions. Depth perception happens primarily due to stereopsis and accommodation of the eye.
Depth sensation is the corresponding term for non-human animals, since although it is known that they can sense the distance of an object, it is not known whether they perceive it in the same way that humans do.[1]
Depth perception arises from a variety of depth cues. These are typically classified into binocular cues and monocular cues. Binocular cues are based on the receipt of sensory information in three dimensions from both eyes and monocular cues can be observed with just one eye.[2] [3] Binocular cues include retinal disparity, which exploits parallax and vergence. Stereopsis is made possible with binocular vision. Monocular cues include relative size (distant objects subtend smaller visual angles than near objects), texture gradient, occlusion, linear perspective, contrast differences, and motion parallax.[4]
Monocular cues provide depth information even when viewing a scene with only one eye.
See main article: Parallax. When an observer moves, the apparent relative motion of several stationary objects against a background gives hints about their relative distance. If information about the direction and velocity of movement is known, motion parallax can provide absolute depth information.[5] This effect can be seen clearly when driving in a car. Nearby things pass quickly, while far-off objects appear stationary. Some animals that lack binocular vision due to their eyes having little common field-of-view employ motion parallax more explicitly than humans for depth cueing (for example, some types of birds, which bob their heads to achieve motion parallax, and squirrels, which move in lines orthogonal to an object of interest to do the same[6]).
When an object moves toward the observer, the retinal projection of an object expands over a period of time, which leads to the perception of movement in a line toward the observer. Another name for this phenomenon is depth from optical expansion.[7] The dynamic stimulus change enables the observer not only to see the object as moving, but to perceive the distance of the moving object. Thus, in this context, the changing size serves as a distance cue.[8] A related phenomenon is the visual system's capacity to calculate time-to-contact (TTC) of an approaching object from the rate of optical expansiona useful ability in contexts ranging from driving a car to playing a ball game. However, the calculation of TTC is, strictly speaking, a perception of velocity rather than depth.
See main article: Kinetic depth effect. If a stationary rigid figure (for example, a wire cube) is placed in front of a point source of light so that its shadow falls on a translucent screen, an observer on the other side of the screen will see a two-dimensional pattern of lines. But if the cube rotates, the visual system will extract the necessary information for perception of the third dimension from the movements of the lines, and a cube is seen. This is an example of the kinetic depth effect.[9] The effect also occurs when the rotating object is solid (rather than an outline figure), provided that the projected shadow consists of lines which have definite corners or end points, and that these lines change in both length and orientation during the rotation.[10]
See main article: Perspective (visual). The property of parallel lines converging in the distance, at infinity, allows us to reconstruct the relative distance of two parts of an object, or of landscape features. An example would be standing on a straight road, looking down the road, and noticing the road narrows as it goes off in the distance. Visual perception of perspective in real space, for instance in rooms, in settlements and in nature, is a result of several optical impressions and the interpretation by the visual system. The angle of vision is important for the apparent size. A nearby object is imaged on a larger area on the retina, the same object or an object of the same size further away on a smaller area.[11] The perception of perspective is possible when looking with one eye only, but stereoscopic vision enhances the impression of the spatial. Regardless of whether the light rays entering the eye come from a three-dimensional space or from a two-dimensional image, they hit the inside of the eye on the retina as a surface. What a person sees, is based on the reconstruction by their visual system, in which one and the same image on the retina can be interpreted both two-dimensionally and three-dimensionally. If a three-dimensional interpretation has been recognised, it receives a preference and determines the perception.[12] In spatial vision, the horizontal line of sight can play a role. In the picture taken from the window of a house, the horizontal line of sight is at the level of the second floor (yellow line). Below this line, the further away objects are, the higher up in the visual field they appear. Above the horizontal line of sight, objects that are further away appear lower than those that are closer. To represent spatial impressions in graphical perspective, one can use a vanishing point.[13] When looking at long geographical distances, perspective effects also partially result from the angle of vision, but not only by this. In picture 5 of the series, in the background is Mont Blanc, the highest mountain in the Alps. It appears lower than the mountain in front in the center of the picture. Measurements and calculations can be used to determine the proportion of the curvature of Earth in the subjectively perceived proportions.
If two objects are known to be the same size (for example, two trees) but their absolute size is unknown, relative size cues can provide information about the relative depth of the two objects. If one subtends a larger visual angle on the retina than the other, the object which subtends the larger visual angle appears closer.
Since the visual angle of an object projected onto the retina decreases with distance, this information can be combined with previous knowledge of the object's size to determine the absolute depth of the object. For example, people are generally familiar with the size of an average automobile. This prior knowledge can be combined with information about the angle it subtends on the retina to determine the absolute depth of an automobile in a scene.
Even if the actual size of the object is unknown and there is only one object visible, a smaller object seems farther away than a large object that is presented at the same location.[14]
See main article: Aerial perspective. Due to light scattering by the atmosphere, objects that are a great distance away have lower luminance contrast and lower color saturation. Due to this, images seem hazy the farther they are away from a person's point of view. In computer graphics, this is often called "distance fog". The foreground has high contrast; the background has low contrast. Objects differing only in their contrast with a background appear to be at different depths.[15] The color of distant objects is also shifted toward the blue end of the spectrum (for example, distant mountains). Some painters, notably Cézanne, employ "warm" pigments (red, yellow and orange) to bring features forward towards the viewer, and "cool" ones (blue, violet, and blue-green) to indicate the part of a form that curves away from the picture plane.
See main article: Accommodation (eye). Accommodation is an oculomotor cue for depth perception. When humans try to focus on distant objects, the ciliary muscles stretch the eye lens, making it thinner, and hence changing the focal length. The kinesthetic sensations of the contracting and relaxing ciliary muscles (intraocular muscles) are sent to the visual cortex where they are used for interpreting distance and depth. Accommodation is only effective for distances greater than 2 meters.
See main article: Occultation. Occultation (also referred to as interposition) happens when near surfaces overlap far surfaces.[16] If one object partially blocks the view of another object, humans perceive it as closer. However, this information only allows the observer to make a "ranking" of relative nearness. The presence of monocular ambient occlusions consist of the object's texture and geometry. These phenomena are able to reduce depth perception latency both in natural and artificial stimuli.[17] [18]
See main article: Curvilinear perspective. At the outer extremes of the visual field, parallel lines become curved, as in a photo taken through a fisheye lens. This effect, although it is usually eliminated from both art and photos by the cropping or framing of a picture, greatly enhances the viewer's sense of being positioned within a real, three-dimensional space. (Classical perspective has no use for this so-called "distortion", although in fact the "distortions" strictly obey optical laws and provide perfectly valid visual information, just as classical perspective does for the part of the field of vision that falls within its frame.)
See main article: Texture gradient. Fine details on nearby objects can be seen clearly, whereas such details are not visible on faraway objects. Texture gradients are the grains of an item. For example, on a long gravel road, the gravel near the observer can be clearly seen of shape, size and colour. In the distance, the road's texture cannot be clearly differentiated.
See main article: Lighting and Shading. The way that light falls on an object and reflects off its surfaces, and the shadows that are cast by objects provide an effective cue for the brain to determine the shape of objects and their position in space.[19]
See main article: Depth of field. Selective image blurring is very commonly used in photography and video to establish the impression of depth. This can act as a monocular cue even when all other cues are removed. It may contribute to depth perception in natural retinal images, because the depth of focus of the human eye is limited. In addition, there are several depth estimation algorithms based on defocus and blurring.[20] Some jumping spiders are known to use image defocus to judge depth.[21]
When an object is visible relative to the horizon, humans tend to perceive objects which are closer to the horizon as being farther away from them, and objects which are farther from the horizon as being closer to them.[22] In addition, if an object moves from a position close to the horizon to a position higher or lower than the horizon, it will appear to move closer to the viewer.
Ocular parallax is a perceptual effect where the rotation of the eye causes perspective-dependent image shifts. This happens because the optical center and the rotation center of the eye are not the same.[23] Ocular parallax does not require head movement. It is separate and distinct from motion parallax.
Binocular cues provide depth information when viewing a scene with both eyes.
See main article: Stereopsis. Animals that have their eyes placed frontally can also use information derived from the different projections of objects onto each retina to judge depth. By using two images of the same scene obtained from slightly different angles, it is possible to triangulate the distance to an object with a high degree of accuracy. Each eye views a slightly different angle of an object seen by the left and right eyes. This happens because of the horizontal separation parallax of the eyes. If an object is far away, the disparity of that image falling on both retinas will be small. If the object is close or near, the disparity will be large. It is stereopsis that tricks people into thinking they perceive depth when viewing Magic Eyes, autostereograms, 3-D movies, and stereoscopic photos.
Convergence is a binocular oculomotor cue for distance and depth perception. Because of stereopsis, the two eyeballs focus on the same object; in doing so they converge. The convergence will stretch the extraocular musclesthe receptors for this are muscle spindles. As happens with the monocular accommodation cue, kinesthetic sensations from these extraocular muscles also help in distance and depth perception. The angle of convergence is smaller when the eye is fixating on objects which are far away. Convergence is effective for distances less than 10 meters.[24]
Antonio Medina Puerta demonstrated that retinal images with no parallax disparity but with different shadows were fused stereoscopically, imparting depth perception to the imaged scene. He named the phenomenon "shadow stereopsis". Shadows are therefore an important, stereoscopic cue for depth perception.[25]
Of these various cues, only convergence, accommodation and familiar size provide absolute distance information. All other cues are relative (as in, they can only be used to tell which objects are closer relative to others). Stereopsis is merely relative because a greater or lesser disparity for nearby objects could either mean that those objects differ more or less substantially in relative depth or that the foveated object is nearer or further away (the further away a scene is, the smaller is the retinal disparity indicating the same depth difference).
Isaac Newton proposed that the optic nerve of humans and other primates has a specific architecture on its way from the eye to the brain. Nearly half of the fibres from the human retina project to the brain hemisphere on the same side as the eye from which they originate. That architecture is labelled hemi-decussation or ipsilateral (same sided) visual projections (IVP). In most other animals, these nerve fibres cross to the opposite side of the brain.
Bernhard von Gudden showed that the OC contains both crossed and uncrossed retinal fibers, and Ramon y Cajal[26] observed that the grade of hemidecussation differs between species.[27] [26] Gordon Lynn Walls formalized a commonly accepted notion into the law of Newton–Müller–Gudden (NGM) saying: that the degree of optic fibre decussation in the optic chiasm is contrariwise related to the degree of frontal orientation of the optical axes of the eyes.[28] In other words, that the number of fibers that do not cross the midline is proportional to the size of the binocular visual field. However, an issue of the Newton–Müller–Gudden law is the considerable interspecific variation in IVP seen in non-mammalian species. That variation is unrelated to mode of life, taxonomic situation, and the overlap of visual fields.[29]
Thus, the general hypothesis was for long that the arrangement of nerve fibres in the optic chiasm in primates and humans has developed primarily to create accurate depth perception, stereopsis, or explicitly that the eyes observe an object from somewhat dissimilar angles and that this difference in angle assists the brain to evaluate the distance.
The eye-forelimb (EF) hypothesis suggests that the need for accurate eye-hand control was key in the evolution of stereopsis. According to the EF hypothesis, stereopsis is evolutionary spinoff from a more vital process: that the construction of the optic chiasm and the position of eyes (the degree of lateral or frontal direction) is shaped by evolution to help the animal to coordinate the limbs (hands, claws, wings or fins).[30]
The EF hypothesis postulates that it has a selective value to have short neural pathways between areas of the brain that receive visual information about the hand and the motor nuclei that control the coordination of the hand. The essence of the EF hypothesis is that evolutionary transformation in OC will affect the length and thereby speed of these neural pathways.[31] Having the primate type of OC means that motor neurons controlling/executing let us say right hand movement, neurons receiving sensory e.g. tactile information about the right hand, and neurons obtaining visual information about the right hand, all will be situated in the same (left) brain hemisphere. The reverse is true for the left hand, the processing of visual, tactile information, and motor commandall of which takes place in the right hemisphere. Cats and arboreal (tree-climbing) marsupials have analogous arrangements (between 30 and 45% of IVP and forward-directed eyes). The result will be that visual info of their forelimbs reaches the proper (executing) hemisphere.The evolution has resulted in small, and gradual fluctuations in the direction of the nerve pathways in the OC. This transformation can go in either direction.[32] Snakes, cyclostomes and other animals that lack extremities have relatively many IVP. Notably these animals have no limbs (hands, paws, fins or wings) to direct. Besides, the left and right body parts of snakelike animals cannot move independently of each other. For example, if a snake coils clockwise, its left eye only sees the left body-part and in an anti-clock-wise position the same eye will see just the right body-part. For that reason, it is functional for snakes to have some IVP in the OC (Naked). Cyclostome descendants (in other words, most vertebrates) that due to evolution ceased to curl and, instead developed forelimbs would be favored by achieving completely crossed pathways as long as forelimbs were primarily occupied in a lateral direction. Reptiles such as snakes that lost their limbs, would gain by recollecting a cluster of uncrossed fibres in their evolution. That seems to have happened, providing further support for the EF hypothesis.
Mice' paws are usually busy only in the lateral visual fields. So, it is in accordance with the EF hypothesis that mice have laterally situated eyes and very few crossings in the OC. The list from the animal kingdom supporting the EF hypothesis is long (BBE). The EF hypothesis applies to essentially all vertebrates while the NGM law and stereopsis hypothesis largely apply just to mammals. Even some mammals display important exceptions, e.g. dolphins have only uncrossed pathways although they are predators.
It is a common suggestion that predatory animals generally have frontally-placed eyes since that permit them to evaluate the distance to prey, whereas preyed-upon animals have eyes in a lateral position, since that permit them to scan and detect the enemy in time. However, many predatory animals may also become prey, and several predators, for instance, the crocodile, have laterally situated eyes and no IVP at all. That OC architecture will provide short nerve connections and optimal eye control of the crocodile's front foot.
Birds, usually have laterally situated eyes, in spite of that they manage to fly through e.g. a dense wood.In conclusion, the EF hypothesis does not reject a significant role of stereopsis, but proposes that primates' superb depth perception (stereopsis) evolved to be in service of the hand; that the particular architecture of the primate visual system largely evolved to establish rapid neural pathways between neurons involved in hand coordination, assisting the hand in gripping the correct branch
Most open-plain herbivores, especially hoofed grazers, lack binocular vision because they have their eyes on the sides of the head, providing a panoramic, almost 360°, view of the horizonenabling them to notice the approach of predators from almost any direction. However, most predators have both eyes looking forwards, allowing binocular depth perception and helping them to judge distances when they pounce or swoop down onto their prey. Animals that spend a lot of time in trees take advantage of binocular vision in order to accurately judge distances when rapidly moving from branch to branch.
Matt Cartmill, a physical anthropologist and anatomist at Boston University, has criticized this theory, citing other arboreal species which lack binocular vision, such as squirrels and certain birds. Instead, he proposes a "Visual Predation Hypothesis," which argues that ancestral primates were insectivorous predators resembling tarsiers, subject to the same selection pressure for frontal vision as other predatory species. He also uses this hypothesis to account for the specialization of primate hands, which he suggests became adapted for grasping prey, somewhat like the way raptors employ their talons.
Photographs capturing perspective are two-dimensional images that often illustrate the illusion of depth. Photography utilizes size, environmental context, lighting, textural gradience, and other effects to capture the illusion of depth.[33] Stereoscopes and Viewmasters, as well as 3D films, employ binocular vision by forcing the viewer to see two images created from slightly different positions (points of view). Charles Wheatstone was the first to discuss depth perception being a cue of binocular disparity.[34] He invented the stereoscope, which is an instrument with two eyepieces that displays two photographs of the same location/scene taken at relatively different angles. When observed, separately by each eye, the pairs of images induced a clear sense of depth.[35] By contrast, a telephoto lens—used in televised sports, for example, to zero in on members of a stadium audience—has the opposite effect. The viewer sees the size and detail of the scene as if it were close enough to touch, but the camera's perspective is still derived from its actual position a hundred meters away, so background faces and objects appear about the same size as those in the foreground.
Trained artists are keenly aware of the various methods for indicating spatial depth (color shading, distance fog, perspective and relative size), and take advantage of them to make their works appear "real". The viewer feels it would be possible to reach in and grab the nose of a Rembrandt portrait or an apple in a Cézanne still life—or step inside a landscape and walk around among its trees and rocks.
Cubism was based on the idea of incorporating multiple points of view in a painted image, as if to simulate the visual experience of being physically in the presence of the subject, and seeing it from different angles. The radical experiments of Georges Braque, Pablo Picasso, Jean Metzinger's Nu à la cheminée,[36] Albert Gleizes's La Femme aux Phlox,[37] [38] or Robert Delaunay's views of the Eiffel Tower,[39] [40] employ the explosive angularity of Cubism to exaggerate the traditional illusion of three-dimensional space. The subtle use of multiple points of view can be found in the pioneering late work of Cézanne, which both anticipated and inspired the first actual Cubists. Cézanne's landscapes and still lives powerfully suggest the artist's own highly developed depth perception. At the same time, like the other Post-Impressionists, Cézanne had learned from Japanese art the significance of respecting the flat (two-dimensional) rectangle of the picture itself; Hokusai and Hiroshige ignored or even reversed linear perspective and thereby remind the viewer that a picture can only be "true" when it acknowledges the truth of its own flat surface. By contrast, European "academic" painting was devoted to a sort of Big Lie that the surface of the canvas is only an enchanted doorway to a "real" scene unfolding beyond, and that the artist's main task is to distract the viewer from any disenchanting awareness of the presence of the painted canvas. Cubism, and indeed most of modern art is an attempt to confront, if not resolve, the paradox of suggesting spatial depth on a flat surface, and explore that inherent contradiction through innovative ways of seeing, as well as new methods of drawing and painting.
In robotics and computer vision, depth perception is often achieved using sensors such as RGBD cameras.[41]