Rescorla–Wagner model explained

The Rescorla–Wagner model ("R-W") is a model of classical conditioning, in which learning is conceptualized in terms of associations between conditioned (CS) and unconditioned (US) stimuli. A strong CS-US association means that the CS signals predict the US. One might say that before conditioning, the subject is surprised by the US, but after conditioning, the subject is no longer surprised, because the CS predicts the coming of the US. The model casts the conditioning processes into discrete trials, during which stimuli may be either present or absent. The strength of prediction of the US on a trial can be represented as the summed associative strengths of all CSs present during the trial. This feature of the model represented a major advance over previous models, and it allowed a straightforward explanation of important experimental phenomena, most notably the blocking effect. Failures of the model have led to modifications, alternative models, and many additional findings. The model has had some impact on neural science in recent years, as studies have suggested that the phasic activity of dopamine neurons in mesostriatal DA projections in the midbrain encodes for the type of prediction error detailed in the model.^[1]

The Rescorla–Wagner model was created by Yale psychologists Robert A. Rescorla and Allan R. Wagner in 1972.

Basic assumptions of the model

The change in the association between a CS and a US that occurs when the two are paired depends on how strongly the US is predicted on that trial - that is, informally, how "surprised" the organism is by the US. The amount of this "surprise" depends on the summed associative strength of all cues present during that trial. In contrast, previous models derived the change in associative strength from the current value of the CS alone.
The associative strength of a CS is represented by a single number. The association is excitatory if the number is positive, inhibitory if it is negative.
The associative strength of a stimulus is expressed directly by the behavior it elicits/inhibits.
The salience of a CS (alpha in the equation) and the strength of the US (beta) are constants and do not change during training.
Only the current associative strength of a cue determines its effect on behavior and the amount of learning it supports. It does not matter how that strength value was arrived at, whether by simple conditioning, reconditioning, or otherwise.

The first two assumptions were new in the Rescorla–Wagner model. The last three assumptions were present in previous models and are less crucial to the R-W model's novel predictions.

Equation

\Delta

	n+1
V
	X

=\alpha_X\beta(λ-V_tot)

and

	n+1
V
	X

	n
V
	X

+\Delta

	n+1
V
	X

where

\DeltaV_X

is the change in the strength, on a single trial, of the association between the CS labelled "X" and the US

\alpha

is the salience of X (bounded by 0 and 1)

\beta

is the rate parameter for the US (bounded by 0 and 1), sometimes called its association value

is the maximum conditioning possible for the US

V_X

is the current associative strength of X

V_tot

is the total associative strength of all stimuli present, that is, X plus any others

The revised RW model by Van Hamme and Wasserman (1994)

Van Hamme and Wasserman have extended the original Rescorla–Wagner (RW) model and introduced a new factor in their revised RW model in 1994:^[2] They suggested that not only conditioned stimuli physically present on a given trial can undergo changes in their associative strength, the associative value of a CS can also be altered by a within-compound-association with a CS present on that trial. A within-compound-association is established if two CSs are presented together during training (compound stimulus). If one of the two component CSs is subsequently presented alone, then it is assumed to activate a representation of the other (previously paired) CS as well. Van Hamme and Wasserman propose that stimuli indirectly activated through within-compound-associations have a negative learning parameter—thus phenomena of retrospective reevaluation can be explained.

Consider the following example, an experimental paradigm called "backward blocking," indicative of retrospective revaluation, where AB is the compound stimulus A+B:

Phase 1: AB–US
Phase 2: A–US

Test trials: Group 1, which received both Phase 1- and 2-trials, elicits a weaker conditioned response (CR) to B compared to the Control group, which only received Phase 1-trials.

The original RW model cannot account for this effect. But the revised model can: In Phase 2, stimulus B is indirectly activated through within-compound-association with A. But instead of a positive learning parameter (usually called alpha) when physically present, during Phase 2, B has a negative learning parameter. Thus during the second phase, B's associative strength declines whereas A's value increases because of its positive learning parameter.

Thus, the revised RW model can explain why the CR elicited by B after backward blocking training is weaker compared with AB-only conditioning.

Some failures of the RW model

Spontaneous recovery from extinction and recovery from extinction caused by reminder treatments (reinstatement)

It is a well-established observation that a time-out interval after completion of extinction results in partial recovery from extinction, i.e., the previously extinguished reaction or response recurs—but usually at a lower level than before extinction training. Reinstatement refers to the phenomenon that exposure to the US from training alone after completion of extinction results in partial recovery from extinction. The RW model can't account for those phenomena.

Extinction of a previously conditioned inhibitor

The RW model predicts that repeated presentation of a conditioned inhibitor alone (a CS with negative associative strength) results in extinction of this stimulus (a decline of its negative associative value). This is a false prediction. Contrarily, experiments show the repeated presentation of a conditioned inhibitor alone even increases its inhibitory potential

Facilitated reacquisition after extinction

One of the assumptions of the model is that the history of conditioning of a CS does not have any influences on its present status—only its current associative value is important. Contrary to this assumption, many experiments^[3] show that stimuli that were first conditioned and then extinguished are more easily reconditioned (i.e., fewer trials are necessary for conditioning).

The exclusiveness of excitation and inhibition

The RW model also assumes that excitation and inhibition are opponent features. A stimulus can either have excitatory potential (a positive associative strength) or inhibitory potential (a negative associative strength), but not both. By contrast it is sometimes observed, that stimuli can have both qualities. One example is backward excitatory conditioning in which a CS is backwardly paired with a US (US–CS instead of CS–US). This usually makes the CS become a conditioned excitor. The stimulus also has inhibitory features which can be proven by the retardation of acquisition test. This test is used to assess the inhibitory potential of a stimulus since it is observed that excitatory conditioning with a previously conditioned inhibitor is retarded. The backwardly conditioned stimulus passes this test and thus seems to have both excitatory and inhibitory features.

Pairing a novel stimulus with a conditioned inhibitor

A conditioned inhibitor is assumed to have a negative associative value. By presenting an inhibitor with a novel stimulus (i.e., its associative strength is zero), the model predicts that the novel cue should become a conditioned excitor. This is not the case in experimental situations. The predictions of the model stem from its basic term (lambda-V). Since the summed associative strength of all stimuli (V) present on the trial is negative (zero + inhibitory potential) and lambda is zero (no US present), the resulting change in the associative strength is positive, thus making the novel cue a conditioned excitor.

CS-preexposure effect

The CS-preexposure effect (also called latent inhibition) is the well-established observation that conditioning after exposure to the stimulus later used as the CS in conditioning is retarded. The RW model doesn't predict any effect of presenting a novel stimulus without a US.

Higher-order conditioning

In higher-order conditioning a previously conditioned CS is paired with a novel cue (i.e., first CS1–US then CS2–CS1). This usually makes the novel cue CS2 elicit similar reactions to the CS1. The model cannot account for this phenomenon since during CS2–CS1 trials, no US is present. But by allowing CS1 to act similarly to a US, one can reconcile the model with this effect.

Sensory preconditioning

Sensory preconditioning refers to first pairing two novel cues (CS1–CS2) and then pairing one of them with a US (CS2–US). This turns both CS1 and CS2 into conditioned excitors. The RW model cannot explain this, since during the CS1–CS2-phase both stimuli have an associative value of zero and lambda is also zero (no US present) which results in no change in the associative strength of the stimuli.

Success and popularity

The Rescorla–Wagner model owes its success to several factors, including^[4]

it has relatively few free parameters and independent variables
it can generate clear and ordinal predictions
it has made a number of successful predictions
cast in such terms as "prediction" and "surprise", the model has intuitive appeal
it has generated a great deal of research, including many new findings and alternative theories

References

Rescorla, R.A. & Wagner, A.R. (1972) A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement, Classical Conditioning II, A.H. Black & W.F. Prokasy, Eds., pp. 64–99. Appleton-Century-Crofts.

External links

Scholarpedia Rescorla–Wagner model
RW Simulator Rescorla-Wagner Model simulator
Rescorla-Wagner simulator in browser
Simulator with design

Notes and References

Hazy. Thomas E.. Frank. Michael J.. O’Reilly. Randall C.. 2010-04-01. Neural Mechanisms Supporting Acquired Phasic Dopamine Responses in Learning: An Integrative Synthesis. Neuroscience and Biobehavioral Reviews. 34. 5. 701–720. 10.1016/j.neubiorev.2009.11.019. 0149-7634. 2839018. 19944716.
Van Hamme, L.J. . Wasserman, E.A. . 1994 . Cue competition in causality judgements: The role of nonpresentation of compound stimulus elements . Learning and Motivation . 25 . 2 . 127–151 . 10.1006/lmot.1994.1008 . dead . https://web.archive.org/web/20140407065145/http://psych.stanford.edu/~jlm/pdfs/causality/VanHammeWasserman04.pdf . 2014-04-07 .
Napier . R.M. . Macrae . M. . Kehoe . E. J. . Rapid reacquisition in conditioning of the rabbit's nictitating membrane response. . Journal of Experimental Psychology: Animal Behavior Processes . 1992 . 18 . 2 . 182–192. 10.1037/0097-7403.18.2.182 .
Miller . Ralph R.. Barnet . Robert C.. Grahame . Nicholas J.. Assessment of the Rescorla-Wagner Model. Psychological Bulletin. 117. 3. 363–386. American Psychological Association. 1995. 10.1037/0033-2909.117.3.363. 7777644.