ADALINE explained

ADALINE (Adaptive Linear Neuron or later Adaptive Linear Element) is an early single-layer artificial neural network and the name of the physical device that implemented this network.[1] [2] [3] [4] [5] It was developed by professor Bernard Widrow and his doctoral student Ted Hoff at Stanford University in 1960. It is based on the perceptron. It consists of a weight, a bias and a summation function. The weights and biases were implemented by rheostats (as seen in the "knobby ADALINE"), and later, memistors.

The difference between Adaline and the standard (McCulloch–Pitts) perceptron is in how they learn. Adaline unit weights are adjusted to match a teacher signal, before applying the Heaviside function (see figure), but the standard perceptron unit weights are adjusted to match the correct output, after applying the Heaviside function.

A multilayer network of ADALINE units is a MADALINE.

Definition

Adaline is a single layer neural network with multiple nodes where each node accepts multiple inputs and generates one output. Given the following variables as:

x

is the input vector

w

is the weight vector

n

is the number of inputs

\theta

is some constant

y

is the output of the model

then we find that the output is

n
y=\sum
j=1

xjwj+\theta

. If we further assume that

x0=1

w0=\theta

then the output further reduces to:

n
y=\sum
j=0

xjwj

Learning rule

The learning rule used by ADALINE is the LMS ("least mean squares") algorithm, a special case of gradient descent.

Define the following notations:

η

is the learning rate (some positive constant)

y

is the output of the model

o

is the target (desired) output

E=(o-y)2

is the square of the error.

The LMS algorithm updates the weights by w \leftarrow w + \eta(o - y)x.

This update rule minimizes

E

, the square of the error,[6] and is in fact the stochastic gradient descent update for linear regression.[7]

MADALINE

MADALINE (Many ADALINE[8]) is a three-layer (input, hidden, output), fully connected, feed-forward artificial neural network architecture for classification that uses ADALINE units in its hidden and output layers, i.e. its activation function is the sign function.[9] The three-layer network uses memistors. Three different training algorithms for MADALINE networks, which cannot be learned using backpropagation because the sign function is not differentiable, have been suggested, called Rule I, Rule II and Rule III.

Despite many attempts, they never succeeded in training more than a single layer of weights in a MADALINE. This was until Widrow saw the backpropagation algorithm in a 1985 Snowbird conference.[10]

MADALINE Rule 1 (MRI) - The first of these dates back to 1962.[11] It consists of two layers. The first layer is made of ADALINE units. Let the output of the i-th ADALINE unit be

oi

. The second layer has two units. One is a majority-voting unit: it takes in all

oi

, and if there are more positives than negatives, then the unit outputs +1, and vice versa. Another is a "job assigner". Suppose the desired output is different from the majority-voted output, say the desired output is -1, then the job assigner calculates the minimal number of ADALINE units that must change their outputs from positive to negative, then picks those ADALINE units that are closest to being negative, and make them update their weights, according to the ADALINE learning rule. It was thought of as a form of "minimal disturbance principle".

The largest MADALINE machine built had 1000 weights, each implemented by a memistor. It was built in 1963 and used MRI for learning.[12]

Some MADALINE machines were demonstrated to perform inverted pendulum balancing, weather prediction, speech recognition, etc.

MADALINE Rule 2 (MRII) - The second training algorithm improved on Rule I and was described in 1988. The Rule II training algorithm is based on a principle called "minimal disturbance". It proceeds by looping over training examples, then for each example, it:

MADALINE Rule 3 - The third "Rule" applied to a modified network with sigmoid activations instead of signum; it was later found to be equivalent to backpropagation.[13]

Additionally, when flipping single units' signs does not drive the error to zero for a particular example, the training algorithm starts flipping pairs of units' signs, then triples of units, etc.[8]

See also

External links

Notes and References

  1. Book: Talking Nets: An Oral History of Neural Networks . 9780262511117 . Anderson . James A. . Rosenfeld . Edward . 2000 . MIT Press .
  2. Youtube: widrowlms: Science in Action
  3. http://www-isl.stanford.edu/~widrow/papers/t1960anadaptive.pdf 1960: An adaptive "ADALINE" neuron using chemical "memistors"
  4. Youtube: widrowlms: The LMS algorithm and ADALINE. Part I - The LMS algorithm
  5. Youtube: widrowlms: The LMS algorithm and ADALINE. Part II - ADALINE and memistor ADALINE
  6. Web site: Adaline (Adaptive Linear) . CS 4793: Introduction to Artificial Neural Networks . Department of Computer Science, University of Texas at San Antonio.
  7. Web site: Avi Pfeffer . CS181 Lecture 5 — Perceptrons . Harvard University .
  8. Rodney Winter. Bernard Widrow. 1988. MADALINE RULE II: A training algorithm for neural networks. IEEE International Conference on Neural Networks. 401–408. 10.1109/ICNN.1988.23872.
  9. Youtube: widrowlms: Science in Action (Madaline is mentioned at the start and at 8:46)
  10. Book: Talking Nets: An Oral History of Neural Networks . 2000 . The MIT Press . 978-0-262-26715-1 . Anderson . James A. . en . 10.7551/mitpress/6626.003.0004 . Rosenfeld . Edward.
  11. Widrow . Bernard . 1962 . Generalization and information storage in networks of adaline neurons . Self-organizing Systems . 435–461.
  12. B. Widrow, “Adaline and Madaline-1963, plenary speech,” Proc. 1st lEEE lntl. Conf. on Neural Networks, Vol. 1, pp. 145-158, San Diego, CA, June 23, 1987
  13. Widrow . Bernard . Michael A. . Lehr . 30 years of adaptive neural networks: perceptron, madaline, and backpropagation . Proceedings of the IEEE . 78 . 9 . 1990 . 1415–1442 . 10.1109/5.58323. 195704643 .