The sharpness of our senses is defined by the finest detail we can discriminate. Visual acuity is measured by the smallest letters that can be distinguished on a chart and is governed by the anatomical spacing of the mosaic of sensory elements on the retina. Yet spatial distinctions can be made on a finer scale still: misalignment of borders can be detected with a precision up to 10 times better than visual acuity, as already shown by Ewald Hering in 1899.[1] This hyperacuity, transcending by far the size limits set by the retinal 'pixels', depends on sophisticated information processing in the brain.
The best example of the distinction between acuity and hyperacuity comes from vision, for example when observing stars on a night sky. The first stage is the optical imaging of the outside world on the retina. Light impinges on the mosaic of receptor sense cells, rods and cones, which covers the retinal surface without gaps or overlap, just like the detecting pixels in the film plane of digital cameras. Each receptor accepts all the light reaching it but acts as a unit, representing a single location in visual space. This compartmentalization sets a limit to the decision whether an image came from a single or a double star (resolution). For a percept of separately articulated stars to emerge, the images of the two must be wide enough apart to leave at least one intervening pixel relatively unstimulated between them. This defines the resolution limit and the basis of visual acuity.
A quite different mechanism operates in hyperacuity, whose quintessential example and the one for which the word was initially coined,[2] [3] is vernier acuity: alignment of two edges or lines can be judged with a precision five or ten times better than acuity. In computer graphics the phrase "sub-pixel resolution" is sometimes used in discussions of anti-aliasing and geometrical superresolution. Though what is in fact involved is not resolution (is it one or two? – a qualitative distinction) but localization (exactly where? – a quantitative judgment) it captures the process. When an image spreads across several pixels, each with graded intensity response but only a single spatial value, the position of the image center can be located more exactly than the width of the pixel, much like the mean of a histogram can be calculated to a fraction of the bin width.
In the figure on the right, the retinal mosaic has superimposed on it, at top, the images of two stars at resolution limit when the intervening gap assures judgment that there are two stars and not a single elongated one. Shown below are the images of two separate short lines; the precision of the read-out of their location difference transcends the dimension of the mosaic elements.
Details of the neural apparatus for achieving hyperacuity still await discovery. That the hyperacuity apparatus involves signals from a range of individual receptor cells, usually in more than one location of the stimulus space, has implications concerning performance in these tasks. Low contrast, close proximity of neighboring stimuli (crowding), and temporal asynchrony of pattern components are examples of factors that cause reduced performance.[4] Of some conceptual interest are age changes[5] and susceptibility to perceptual learning[6] which can help in understanding underlying neural channeling.
Two basic algorithms have been proposed to explain mammalian visual hyperacuity: spatial, based on population firing rates, and temporal, based on temporal delays in response to miniature eye movements. While none of them gained empirical support so far, the plausibility of the former had been critically questioned by the discrete nature of neural firing [7]
The optics of the human eye are extremely simple, the main imaging component being a single element lens which can change its strength by muscular control. There is only limited facility for correction of many of the aberrations which are normally corrected in good quality instrumental optical systems.[8] Such a simple lens must inevitably have a significant amount of spherical aberration, which produces secondary lobes in the spread function. However, it has been found by experiment that light entering the pupil off-axis is less efficient in creating an image (the Stiles-Crawford effect), which has the effect of substantially reducing these unwanted side lobes. Also, the effects of diffraction limits can, with care, be used to partially compensate for the aberrations.
The retinal receptors are physically situated behind a neural layer carrying the post-retinal processing elements. Light cannot pass through this layer undistorted. In fact, measurements on the Modulation Transfer Function (MTF) suggest that the MTF degradations due to the diffusion through that neural layer are of a similar order as those due to the optics. By interplay of these different components it has been found that the overall optical quality, although poor compared to photographic optics, can remain tolerably near constant through a considerable range of pupil diameters and light levels.
When presented with colored information the optical imperfections are particularly great. The optics have residual uncorrected chromatic aberration of nearly 2 dioptres from extreme red to extreme blue/violet, mainly in the green to blue/violet region. Ophthalmologists have for many decades used this large change of focus through the spectrum in the process of providing correction spectacles. This means that such corrections can be as simple as the eye lens itself.
In addition, this large chromatic aberration has also been used to advantage within the make up of the eye itself. Instead of having the three primary colors (red, green & blue) to manipulate, nature has used this gross chromatic shift to provide a cortical visual function which is based on three sets of color opponency instead of three basic primary colors.[9] These are red / green, yellow / blue and black / white, this black / white being synonymous with brightness. Then, by using just one very high resolution opponency between red & green primaries, nature cleverly uses a mean of these two colors (i.e. yellow), together with very low resolution blue to create a background color wash capability. In turn (by using the hyperacuity capability on the low resolution opponency) this can also serve as the source of perception of 3D depth.
The human eye has a roughly hexagonal matrix of photodetectors.[10] There is now considerable evidence that such a matrix layout provides optimum efficiency of information transfer. A number of other workers have considered using hexagonal matrices, but they then seem to subscribe to a mathematical approach and axes at 60 degrees differential orientation. In turn this must then make use of complex numbers. Overington & his team sought (and found), instead, a way to approximate to a hexagonal matrix, while at the same time retaining a conventional Cartesian layout for processing.
Although there are many and varied spatial interactions evident in the early neural networks of the human visual system, only a few are of great importance in high fidelity information sensing. The rest are predominantly associated with processes such as local adaptation. It has therefore been found that the most important interactions are of very local extent, but it is the subtleties of usage of these interactions which seem most important. For hexagonal matrices a single ring of six receptors surrounding an addressed pixel is the simplest symmetrical layout. The general finding from primate receptive field studies is that any such local group yields no output for a uniform input illumination. So this is essentially similar to one of the classical Laplacian receptive fields for square arrays - that with weightings of -1 on each side and -0.5 on each corner. The only difference is an aspect ratio of 8:7.07 (or approximately 8:7 to within 1%). Very useful further evidence of the processes going on in his area comes from the electron-microscopy studies of Kolb [11] These clearly show the neural structures which lead to difference signals being transmitted further. If one combines a point spread function having a Gaussian form and having an S.D. of 1.3 'pixels' with a single ring Laplacian - type operator, the resultant is a function with very similar properties to a DOG function as discussed by Marr.[12]
It is normally assumed, both in computer image processing and in visual science, that a local excitatory / inhibitory process is effectively a second differencing process. However, there seems to be strong psychophysical evidence for human vision that it is first differences which control human visual performance. It is necessary for the positive & negative parts of all outputs from Laplacian-like neurones to be separated for sending onwards to the cortex, since it is impossible to transmit negative signals. This means that each neurone of this type must be considered to be a set of six dipoles, such that each surround inhibition can only cancel its own portion of the central stimulation. Such a separation of positive and negative components is totally compatible with retinal physiology and is one possible function for the known pair of midget bipolar channels for each receptor.[13]
The basic evidence for orientation sensing in human vision is that it appears to be carried out (in Area 17 of the striate cortex) by banks of neurones at fairly widely spaced orientations.[14] The neurones as measured have characteristically elliptical receptive fields.[15] However, both the actual interval between the orientations and the exact form & aspect ratio of the elliptical fields is open to question, but at the same time the said receptive fields have to have been compounded with the midget receptive fields at the retina. Yet again, for probe measurements of 'single neurone' performance, the receptive field measured includes the effects of all stages of optical & neural processing that have gone before.
For orientation specific units operating on a hexagonal matrix, it makes most sense to have them with their primary & secondary axes occurring every 30 degrees of orientation. This 30 degree separation of orientations agrees with angular spacing of such units deduced to be desirable by John Canny from a mathematical approach.[16] In the absence of specific details, it seemed that a roughly best compromise between computational efficiency and simplicity on the one hand and adequate orientation al tuning on the other should be of extent 5 x 1 pixels. This again agrees with that independently suggested by Canny and also observed in primate vision studies by other researchers. The receptive field units have orientation tuning functions which bear a satisfying resemblance to the orientation tuning functions established for vision by psychophysical tests.
There is the possibility of recombining the partial difference functions arriving at the cortex in two ways.[17] It is possible to consider analysis of a second difference map - by searching for zero crossings, which was most popular until the mid-1980s. Alternatively one can sense local peaks in the first difference map, which has become increasingly popular since then. This latter then depends on finding the position of the peak of the edge image by a 3 x 1 analysis & quadratic curve fitting. In either case it has been shown that the edge position can be located to something better than 0.1 pixels because of the very broad spread of the edge due to the poor optical image, while it has also been shown that, by equally simple arithmetic, the local edge orientation can be derived to better than 1 degree. Furthermore, the interplay of first and second difference data provides very powerful means of analyzing motion, stereo, color, texture & other scene properties.
The distinction between resolving power or acuity, literally sharpness, which depends on the spacing of the individual receptors through which the outside world is sampled, and the ability to identify individual locations in the sensory space is universal among modalities. There are many other examples where the organism's performance substantially surpasses the spacing of the concerned receptor cell population. The normal human has only three kinds of color receptors in the retina, yet in color vision, by subtly weighing and comparing their relative output, one can detect thousand of hues. Braille reading involves hyperacuity among touch receptors in the fingertips.[18] We can hear many more different tones than there are hair cells in the cochlea; pitch discrimination, without which a violin could not be played in tune, is a hyperacuity.[19] Hyperacuity has been identified in many animal species, for example in the detection of prey by the electric fish,[20] echolocation in the bat,[21] and in the ability of rodents to localize objects based on mechanical deformations of their whiskers.[22]
In clinical vision tests,[23] hyperacuity has a special place because its processing is at the interfaces of the eye's optics, retinal functions, activation of the primary visual cortex and the perceptual apparatus. In particular, the determination of normal stereopsis is a hyperacuity task. Hyperacuity perimetry is used in clinical trials evaluating therapies for retinal degenerative changes.[24]