The consciousness and binding problem is the problem of how objects, background, and abstract or emotional features are combined into a single experience.[1]
The binding problem refers to the overall encoding of our brain circuits for the combination of decisions, actions, and perception. It is considered a "problem" due to the fact that no complete model exists.
The binding problem can be subdivided into the four areas of perception, neuroscience, cognitive science, and the philosophy of mind. It includes general considerations on coordination, the subjective unity of perception, and variable binding.[2]
Attention is crucial in determining which phenomena appear to be bound together, noticed, and remembered.[3] This specific binding problem is generally referred to as temporal synchrony. At the most basic level, all neural firing and its adaptation depends on specific consideration to timing (Feldman, 2010). At a much larger level, frequent patterns in large scale neural activity are a major diagnostic and scientific tool.[4]
A popular hypothesis mentioned by neuroscientist Peter Milner, in his 1974 article A Model for Visual Shape Recognition, has been that features of individual objects are bound/segregated via synchronization of the activity of different neurons in the cortex.[5] [6] The theory, called binding-by-synchrony (BBS), is hypothesized to occur through the transient mutual synchronization of neurons located in different regions of the brain when the stimulus is presented.[7] Empirical testing of the idea was brought to light when von der Malsburg proposed that feature binding posed a special problem that could not be covered simply by cellular firing rates.[8] However, it has been shown this theory may not be a problem since it was revealed that the modules code jointly for multiple features, countering the feature-binding issue.[9] Temporal synchrony has been shown to be the most prevalent when regarding the first problem, "General Considerations on Coordination," because it is an effective method to take in surroundings and is good for grouping and segmentation. A number of studies suggested that there is indeed a relationship between rhythmic synchronous firing and feature binding. This rhythmic firing appears to be linked to intrinsic oscillations in neuronal somatic potentials, typically in the gamma range around 40 – 60 hertz.[10] The positive arguments for a role for rhythmic synchrony in resolving the segregational object-feature binding problem have been summarized by Singer.[11] There is certainly extensive evidence for synchronization of neural firing as part of responses to visual stimuli.
However, there is inconsistency between findings from different laboratories. Moreover, a number of recent reviewers, including Shadlen and Movshon and Merker[12] have raised concerns about the theory being potentially untenable. Thiele and Stoner found that perceptual binding of two moving patterns had no effect on synchronization of the neurons responding to two patterns: coherent and noncoherent plaids.[13] In the primary visual cortex, Dong et al. found that whether two neurons were responding to contours of the same shape or different shapes had no effect on neural synchrony since synchrony is independent of binding condition.
Shadlen and Movshon raise a series of doubts about both the theoretical and the empirical basis for the idea of segregational binding by temporal synchrony. There is no biophysical evidence that cortical neurons are selective to synchronous input at this point of precision, and cortical activity with synchrony this precise is rare. Synchronization is also connected to endorphin activity. It has been shown that precise spike timing may not be necessary to illustrate a mechanism for visual binding and is only prevalent in modeling certain neuronal interactions. In contrast, Seth[14] describes an artificial brain-based robot that demonstrates multiple, separate, widely distributed neural circuits, firing at different phases, showing that regular brain oscillations at specific frequencies are essential to the neural mechanisms of binding.
Goldfarb and Treisman[15] point out that a logical problem appears to arise for binding solely via synchrony if there are several objects that share some of their features and not others. At best synchrony can facilitate segregation supported by other means (as von der Malsburg acknowledges).[16]
A number of neuropsychological studies suggest that the association of color, shape and movement as "features of an object" is not simply a matter of linking or "binding", but shown to be inefficient to not bind elements into groups when considering association,[17] and give extensive evidence for top-down feedback signals that ensure that sensory data are handled as features of (sometimes wrongly) postulated objects early in processing. Pylyshyn[18] has also emphasized the way the brain seems to pre-conceive objects from which features are to be allocated. to which are attributed continuing existence even if features such as color change. This is because visual integration increases over time, and indexing visual objects helps to ground visual concepts.
The visual feature binding problem refers to the question of why we do not confuse a red circle and a blue square with a blue circle and a red square. The understanding of the circuits in the brain stimulated for visual feature binding is increasing. A binding process is required for us to accurately encode various visual features in separate cortical areas.
In her feature integration theory, Treisman suggested that one of the first stages of binding between features is mediated by the features' links to a common location. The second stage is combining individual features of an object that requires attention, and selecting that object occurs within a "master map" of locations. Psychophysical demonstrations of binding failures under conditions of full attention provide support for the idea that binding is accomplished through common location tags.
An implication of these approaches is that sensory data such as color or motion may not normally exist in "unallocated" form. For Merker:[19] "The 'red' of a red ball does not float disembodied in an abstract color space in V4." If color information allocated to a point in the visual field is converted directly, via the instantiation of some form of propositional logic (analogous to that used in computer design) into color information allocated to an "object identity" postulated by a top-down signal as suggested by Purves and Lotto (e.g. There is blue here + Object 1 is here = Object 1 is blue) no special computational task of "binding together" by means such as synchrony may exist. (Although Von der Malsburg[20] poses the problem in terms of binding "propositions" such as "triangle" and "top", these, in isolation, are not propositional.)
How signals in the brain come to have propositional content, or meaning, is a much larger issue. However, both Marr[21] and Barlow[22] suggested, on the basis of what was known about neural connectivity in the 1970s that the final integration of features into a percept would be expected to resemble the way words operate in sentences.
The role of synchrony in segregational binding remains controversial. Merker[19] has recently suggested that synchrony may be a feature of areas of activation in the brain that relates to an "infrastructural" feature of the computational system analogous to increased oxygen demand indicated via BOLD signal contrast imaging. Apparent specific correlations with segregational tasks may be explainable on the basis of interconnectivity of the areas involved. As a possible manifestation of a need to balance excitation and inhibition over time it might be expected to be associated with reciprocal re-entrant circuits as in the model of Seth et al.[14] (Merker gives the analogy of the whistle from an audio amplifier receiving its own output.)
Visual feature binding is suggested to have a selective attention to the locations of the objects. If indeed spatial attention does play a role in binding integration it will do so primarily when object location acts as a binding cue. A study's findings have shown that functional MRI images indicate regions of the parietal cortex involved in spatial attention, engaged in feature conjunction tasks in single feature tasks. The task involved multiple objects being shown simultaneously at different locations which activated the parietal cortex, whereas when multiple objects are shown sequentially at the same location the parietal cortex was less engaged.[23]
Defoulzi et al. investigated feature binding through two feature dimensions to disambiguate whether a specific combination of color and motion direction is perceived as bound or unbound. Two behaviorally relevant features, including color and motion belonging to the same object, are defined as the "bound" condition, whereas the "unbound" condition has features that belong to different objects. Local field potentials were recorded from the lateral prefrontal cortex (lPFC) in monkeys and were monitored during different stimulus configurations. The findings suggest a neural representation of visual feature binding in 4 to 12 Hertz frequency bands. It is also suggested that transmission of binding information is relayed through different lPFC neural subpopulations. The data shows behavioral relevance of binding information that is linked to the animal's reaction time. This includes the involvement of the prefrontal cortex targeted by the dorsal and ventral visual streams in binding visual features from different dimensions (color and motion).[24]
It is suggested that the visual feature binding consists of two different mechanisms in visual perception. One mechanism consists of agonistic familiarity of possible combinations of features integrating several temporal integration windows. It is speculated that this process is mediated by neural synchronization processes and temporal synchronization in the visual cortex. The second mechanism is mediated by familiarity with the stimulus and is provided by attentional top-down support from familiar objects.[25]
Smythies[26] defines the combination problem, also known as the subjective unity of perception, as "How do the brain mechanisms actually construct the phenomenal object?". Revonsuo[1] equates this to "consciousness-related binding", emphasizing the entailment of a phenomenal aspect. As Revonsuo explores in 2006,[27] there are nuances of difference beyond the basic BP1:BP2 division. Smythies speaks of constructing a phenomenal object ("local unity" for Revonsuo) but philosophers such as René Descartes, Gottfried Wilhelm Leibniz, Immanuel Kant, and James (see Brook and Raymont)[28] have typically been concerned with the broader unity of a phenomenal experience ("global unity" for Revonsuo) – which, as Bayne[29] illustrates may involve features as diverse as seeing a book, hearing a tune and feeling an emotion. Further discussion will focus on this more general problem of how sensory data that may have been segregated into, for instance, "blue square" and "yellow circle" are to be re-combined into a single phenomenal experience of a blue square next to a yellow circle, plus all other features of their context. There is a wide range of views on just how real this "unity" is, but the existence of medical conditions in which it appears to be subjectively impaired, or at least restricted, suggests that it is not entirely illusory.[30]
There are many neurobiological theories about the subjective unity of perception. Different visual features such as color, size, shape, and motion are computed by largely distinct neural circuits but we experience this as an integrated whole. The different visual features interact with each other in various ways. For example, shape discrimination of objects is strongly affected by orientation but only slightly affected by object size.[31] Some theories suggest that global perception of the integrated whole involves higher order visual areas.[32] There is also evidence that the posterior parietal cortex is responsible for perceptual scene segmentation and organization.[33] Bodies facing each other are processed as a single unit and there is increased coupling of the extrastriate body area (EBA) and the posterior superior temporal sulcus (pSTS) when bodies are facing each other.[34] This suggests that the brain is biased towards grouping humans in twos or dyads.[35]
Early philosophers René Descartes and Gottfried Wilhelm Leibniz noted that the apparent unity of our experience is an all-or-none qualitative characteristic that does not appear to have an equivalent in the known quantitative features, like proximity or cohesion, of composite matter. William James,[36] in the nineteenth century, considered the ways the unity of consciousness might be explained by known physics and found no satisfactory answer. He coined the term "combination problem", in the specific context of a "mind-dust theory" in which it is proposed that a full human conscious experience is built up from proto- or micro-experiences in the way that matter is built up from atoms. James claimed that such a theory was incoherent, since no causal physical account could be given of how distributed proto-experiences would "combine". He favoured instead a concept of "co-consciousness" in which there is one "experience of A, B and C" rather than combined experiences. A detailed discussion of subsequent philosophical positions is given by Brook and Raymont (see 26). However, these do not generally include physical interpretations.
Whitehead[37] proposed a fundamental ontological basis for a relation consistent with James's idea of co-consciousness, in which many causal elements are co-available or "compresent" in a single event or "occasion" that constitutes a unified experience. Whitehead did not give physical specifics, but the idea of compresence is framed in terms of causal convergence in a local interaction consistent with physics. Where Whitehead goes beyond anything formally recognized in physics is in the "chunking" of causal relations into complex but discrete "occasions". Even if such occasions can be defined, Whitehead's approach still leaves James's difficulty with finding a site, or sites, of causal convergence that would make neurobiological sense for "co-consciousness". Sites of signal convergence do clearly exist throughout the brain but there is a concern to avoid re-inventing what Daniel Dennett[38] calls a Cartesian Theater or a single central site of convergence of the form that Descartes proposed.
Descartes's central "soul" is now rejected because neural activity closely correlated with conscious perception is widely distributed throughout the cortex. The remaining choices appear to be either separate involvement of multiple distributed causally convergent events or a model that does not tie a phenomenal experience to any specific local physical event but rather to some overall "functional" capacity. Whichever interpretation is taken, as Revonsuo[1] indicates, there is no consensus on what structural level we are dealing with – whether the cellular level, that of cellular groups as "nodes", "complexes" or "assemblies" or that of widely distributed networks. There is probably only general agreement that it is not the level of the whole brain, since there is evidence that signals in certain primary sensory areas, such as the V1 region of the visual cortex (in addition to motor areas and cerebellum), do not contribute directly to phenomenal experience.
Stoll and colleagues conducted an fMRI experiment to see whether participants would view a dynamic bistable stimulus globally or locally. Responses in lower visual cortical regions were suppressed when participants viewed the stimulus globally. However, if global perception was without shape grouping, higher cortical regions were suppressed. This experiment shows that higher order cortex is important in perceptual grouping.
Grassi and colleagues used three different motion stimuli to investigate scene segmentation or how meaningful entities are grouped together and separated from other entities in a scene. Across all stimuli, scene segmentation was associated with increased activity in the posterior parietal cortex and decreased activity in lower visual areas. This suggests that the posterior parietal cortex is important for viewing an integrated whole.
Mersad and colleagues used an EEG frequency tagging technique to differentiate between brain activity for the integrated whole object and brain activity for parts of the object.[39] The results showed that the visual system binds two humans in close proximity as part of an integrated whole. These results are consistent with evolutionary theories that face-to-face bodies are one of the earliest representations of social interaction. It also supports other experimental work showing that body-selective visual areas respond more strongly to facing bodies.[40]
Experiments have shown that ferritin and neuromelanin in fixed human substantia nigra pars compacta (SNc) tissue are able to support widespread electron tunneling.[41] Further experiments have shown that ferritin structures similar to ones found in SNc tissue are able to conduct electrons over distances as great as 80 microns, and that they behave in accordance with Coulomb blockade theory to perform a switching or routing function.[42] [43] Both of these observations are consistent with earlier predictions that are part of a hypothesis that ferritin and neuromelanin can provide a binding mechanism associated with an action selection mechanism,[44] although the hypothesis itself has not yet been directly investigated. The hypothesis and these observations have been applied to Integrated Information Theory.[45]
Daniel Dennett[38] has proposed that we, as humans, sensing our experiences as individual single events is illusory and that, instead, at any one time there are "multiple drafts" of sensory patterns at multiple sites. Each would only cover a fragment of what we think we experience. Arguably, Dennett is claiming that consciousness is not unified and there is no phenomenal binding problem. Most philosophers have difficulty with this position (see Bayne), but some physiologists agree with it. In particular, the demonstration of perceptual asynchrony in psychophysical experiments by Moutoussis and Zeki,[46] [47] where color is perceived before orientation of lines and before motion by 40 and 80 ms respectively, constitutes an argument that, over these very short time periods, different attributes are consciously perceived at different times, leading to the view that at least over these brief periods of time after visual stimulation, different events are not bound to each other, leading to the view of a disunity of consciousness,[48] at least over these brief time intervals. Dennett's view might be in keeping with evidence from recall experiments and change blindness purporting to show that our experiences are much less rich than we sense them to be – what has been called the Grand Illusion.[49] However, few, if any, other authors suggest the existence of multiple partial "drafts". Moreover, also on the basis of recall experiments, Lamme[50] has challenged the idea that richness is illusory, emphasizing that phenomenal content cannot be equated with content to which there is cognitive access.
Dennett does not tie drafts to biophysical events. Multiple sites of causal convergence are invoked in specific biophysical terms by Edwards[51] and Sevush.[52] In this view the sensory signals to be combined in phenomenal experience are available, in full, at each of multiple sites. To avoid non-causal combination, each site/event is placed within an individual neuronal dendritic tree. The advantage is that "compresence" is invoked just where convergence occurs neuro-anatomically. The disadvantage, as for Dennett, is the counter-intuitive concept of multiple "copies" of experience. The precise nature of an experiential event or "occasion", even if local, also remains uncertain.
The majority of theoretical frameworks for the unified richness of phenomenal experience adhere to the intuitive idea that experience exists as a single copy, and draw on "functional" descriptions of distributed networks of cells. Baars[53] has suggested that certain signals, encoding what we experience, enter a "Global Workspace" within which they are "broadcast" to many sites in the cortex for parallel processing. Dehaene, Changeux and colleagues[54] have developed a detailed neuro-anatomical version of such a workspace. Tononi and colleagues[55] have suggested that the level of richness of an experience is determined by the narrowest information interface "bottleneck" in the largest sub-network or "complex" that acts as an integrated functional unit. Lamme[50] has suggested that networks supporting reciprocal signaling rather than those merely involved in feed-forward signaling support experience. Edelman and colleagues have also emphasized the importance of re-entrant signaling.[56] Cleeremans[57] emphasizes meta-representation as the functional signature of signals contributing to consciousness.
In general, such network-based theories are not explicitly theories of how consciousness is unified, or "bound", but rather theories of functional domains within which signals contribute to unified conscious experience. A concern about functional domains is what Rosenberg[58] has called the boundary problem; it is hard to find a unique account of what is to be included and what excluded. Nevertheless, this is, if anything is, the consensus approach.
Within the network context, a role for synchrony has been invoked as a solution to the phenomenal binding problem as well as the computational one. In his book, The Astonishing Hypothesis,[59] Crick appears to be offering a solution to BP2 as much as BP1. Even von der Malsburg,[60] introduces detailed computational arguments about object feature binding with remarks about a "psychological moment". The Singer group[61] also appear to be interested as much in the role of synchrony in phenomenal awareness as in computational segregation.
The apparent incompatibility of using synchrony to both segregate and unify might be explained by sequential roles. However, Merker[19] points out what appears to be a contradiction in attempts to solve the subjective unity of perception in terms of a functional (effectively meaning computational) rather than a local biophysical domain in the context of synchrony.
Functional arguments for a role for synchrony are in fact underpinned by analysis of local biophysical events. However, Merker[19] points out that the explanatory work is done by the downstream integration of synchronized signals in post-synaptic neurons: "It is, however, by no means clear what is to be understood by 'binding by synchrony' other than the threshold advantage conferred by synchrony at, and only at, sites of axonal convergence onto single dendritic trees..." In other words, although synchrony is proposed as a way of explaining binding on a distributed rather than a convergent basis, the justification rests on what happens at convergence. Signals for two features are proposed as bound by synchrony because synchrony effects downstream convergent interaction. Any theory of phenomenal binding based on this sort of computational function would seem to follow the same principle. The phenomenality would entail convergence, if the computational function does.
The assumption in many of the quoted models suggest that computational and phenomenal events, at least at some point in the sequence of events, parallel each other in some way. The difficulty remains in identifying what that way might be. Merker's[19] analysis suggests that either (1) both computational and phenomenal aspects of binding are determined by convergence of signals on neuronal dendritic trees, or (2) that our intuitive ideas about the need for "binding" in a "holding together" sense in both computational and phenomenal contexts are misconceived. We may be looking for something extra that is not needed. Merker, for instance, argues that the homotopic connectivity of sensory pathways does the necessary work.
In modern connectionism, cognitive neuroarchitectures are developed (e.g. "Oscillatory Networks",[62] "Integrated Connectionist/Symbolic (ICS) Cognitive Architecture",[63] "Holographic Reduced Representations (HRRs)",[64] "Neural Engineering Framework (NEF)"[65]) that solve the binding problem by means of integrative synchronization mechanisms (e.g. the (phase-)synchronized "Binding-by-synchrony (BBS)" mechanism)
(1) in perceptual cognition ("low-level cognition"): This is the neurocognitive performance of how an object or event that is perceived (e.g., a visual object) is dynamically "bound together" from its properties (e.g., shape, contour, texture, color, direction of motion) as a mental representation, i.e., can be experienced in the mind as a unified "Gestalt" in terms of Gestalt psychology ("feature binding", "feature linking"),
(2) and in language cognition ("high-level cognition"): This is the neurocognitive performance of how a linguistic unit (e.g. a sentence) is generated by relating semantic concepts and syntactic roles to each other in a dynamic way so that one can generate systematic and compositional symbol structures and propositions that are experienced as complex mental representations in the mind ("variable binding").[66] [67] [68] [69]
According to Igor Val Danilov,[70] knowledge about neurophysiological processes during Shared intentionality can reveal insights into the binding problem and even the perception of object development since intentionality succeeds before organisms confront the binding problem. Indeed, at the beginning of life, the environment is the cacophony of stimuli: electromagnetic waves, chemical interactions, and pressure fluctuations. Because the environment is uncategorised for the organisms at this beginning stage of development, the sensation is too limited by the noise to solve the cue problem — the relevant stimulus cannot overcome the noise magnitude if it passes through the senses. While very young organisms need to combine objects, background and abstract or emotional features into a single experience for building the representation of the surrounded reality, they cannot distinguish relevant sensory stimuli independently to integrate them into object representations. Even the embodied dynamical system approach cannot get around the cue to noise problem. The application of embodied information requires an already categorised environment onto objects — holistic representation of reality — which occurs through (and only after the emergence of) perception and intentionality.[71] [72]