In machine learning, particularly in the creation of artificial neural networks, ensemble averaging is the process of creating multiple models and combining them to produce a desired output, as opposed to creating just one model. Frequently an ensemble of models performs better than any individual model, because the various errors of the models "average out."
Ensemble averaging is one of the simplest types of committee machines. Along with boosting, it is one of the two major types of static committee machines.[1] In contrast to standard network design in which many networks are generated but only one is kept, ensemble averaging keeps the less satisfactory networks around, but with less weight.[2] The theory of ensemble averaging relies on two properties of artificial neural networks:[3]
Ensemble averaging creates a group of networks, each with low bias and high variance, then combines them to a new network with (hopefully) low bias and low variance. It is thus a resolution of the bias-variance dilemma.[4] The idea of combining experts has been traced back to Pierre-Simon Laplace.[5]
The theory mentioned above gives an obvious strategy: create a set of experts with low bias and high variance, and then average them. Generally, what this means is to create a set of experts with varying parameters; frequently, these are the initial synaptic weights, although other factors (such as the learning rate, momentum etc.) may be varied as well. Some authors recommend against varying weight decay and early stopping. The steps are therefore:
Alternatively, domain knowledge may be used to generate several classes of experts. An expert from each class is trained, and then combined.
A more complex version of ensemble average views the final result not as a mere average of all the experts, but rather as a weighted sum. If each expert is
yi
\tilde{y}
\tilde{y}(x;\alpha)=
p | |
\sum | |
j=1 |
\alphajyj(x)
\alpha
It can be seen that most forms of neural networks are some subset of a linear combination: the standard neural net (where only one expert is used) is simply a linear combination with all
\alphaj=0
\alphak=1
\alphaj
A more recent ensemble averaging method is negative correlation learning,[6] proposed by Y. Liu and X. Yao. Now this method has been widely used in evolutionary computing.