The generalized Hebbian algorithm, also known in the literature as Sanger's rule, is a linear feedforward neural network for unsupervised learning with applications primarily in principal components analysis. First defined in 1989,[1] it is similar to Oja's rule in its formulation and stability, except it can be applied to networks with multiple outputs. The name originates because of the similarity between the algorithm and a hypothesis made by Donald Hebb[2] about the way in which synaptic strengths in the brain are modified in response to experience, i.e., that changes are proportional to the correlation between the firing of pre- and post-synaptic neurons.[3]
Consider a problem of learning a linear code for some data. Each data is a multi-dimensional vector
x\in\Rn
w1,...,wm
m=n
m<n
w1,...,wm
The generalized Hebbian algorithm is an iterative algorithm to find the highest principal component vectors, in an algorithmic form that resembles unsupervised Hebbian learning in neural networks.
Consider a one-layered neural network with
n
m
y1,...,ym
wij
j
i
The generalized Hebbian algorithm learning rule is of the form
\Deltawij~=~ηyi\left(xj-
i | |
\sum | |
k=1 |
wkjyk\right)
where
η
In matrix form, Oja's rule can be written
dw(t) | |
dt |
~=~w(t)Q-diag[w(t)Qw(t)T]w(t)
and the Gram-Schmidt algorithm is
\Deltaw(t)~=~-lower[w(t)w(t)T]w(t)
where is any matrix, in this case representing synaptic weights, is the autocorrelation matrix, simply the outer product of inputs, is the function that diagonalizes a matrix, and is the function that sets all matrix elements on or above the diagonal equal to 0. We can combine these equations to get our original rule in matrix form,
\Deltaw(t)~=~η(t)\left(y(t)x(t)T-LT[y(t)y(t)T]w(t)\right)
where the function sets all matrix elements above the diagonal equal to 0, and note that our output is a linear neuron.[1]
Oja's rule is the special case where
m=1
With Oja's rule,
w1
E[xj]=E[w1jy1]
j
w1
y1=\sumiw1ixi
E[\|x-y1w1\|2]
When
m=2
w1
x'=x-y1w1
By induction, this results in finding the top-
m
m
The generalized Hebbian algorithm is used in applications where a self-organizing map is necessary, or where a feature or principal components analysis can be used. Examples of such cases include artificial intelligence and speech and image processing.
Its importance comes from the fact that learning is a single-layer process—that is, a synaptic weight changes only depending on the response of the inputs and outputs of that layer, thus avoiding the multi-layer dependence associated with the backpropagation algorithm. It also has a simple and predictable trade-off between learning speed and accuracy of convergence as set by the learning rate parameter .[4]
As an example, (Olshausen and Field, 1996)[6] performed the generalized Hebbian algorithm on 8-by-8 patches of photos of natural scenes, and found that it results in Fourier-like features. The features are the same as the principal components found by principal components analysis, as expected, and that, the features are determined by the
64 x 64