BrownBoost is a boosting algorithm that may be robust to noisy datasets. BrownBoost is an adaptive version of the boost by majority algorithm. As is true for all boosting algorithms, BrownBoost is used in conjunction with other machine learning methods. BrownBoost was introduced by Yoav Freund in 2001.[1]
AdaBoost performs well on a variety of datasets; however, it can be shown that AdaBoost does not perform well on noisy data sets.[2] This is a result of AdaBoost's focus on examples that are repeatedly misclassified. In contrast, BrownBoost effectively "gives up" on examples that are repeatedly misclassified. The core assumption of BrownBoost is that noisy examples will be repeatedly mislabeled by the weak hypotheses and non-noisy examples will be correctly labeled frequently enough to not be "given up on." Thus only noisy examples will be "given up on," whereas non-noisy examples will contribute to the final classifier. In turn, if the final classifier is learned from the non-noisy examples, the generalization error of the final classifier may be much better than if learned from noisy and non-noisy examples.
The user of the algorithm can set the amount of error to be tolerated in the training set. Thus, if the training set is noisy (say 10% of all examples are assumed to be mislabeled), the booster can be told to accept a 10% error rate. Since the noisy examples may be ignored, only the true examples will contribute to the learning process.
BrownBoost uses a non-convex potential loss function, thus it does not fit into the AdaBoost framework. The non-convex optimization provides a method to avoid overfitting noisy data sets. However, in contrast to boosting algorithms that analytically minimize a convex loss function (e.g. AdaBoost and LogitBoost), BrownBoost solves a system of two equations and two unknowns using standard numerical methods.
The only parameter of BrownBoost (
c
t
\alpha
T
A larger value of
c
c
During each iteration of the algorithm, a hypothesis is selected with some advantage over random guessing. The weight of this hypothesis
\alpha
t
\alpha
t
ri(xj)
s
The initial potential is defined to be
1 | |
m |
m | |
\sum | |
j=1 |
1-erf(\sqrt{c})=1-erf(\sqrt{c})
1 | |
m |
m | |
\sum | |
j=1 |
1-erf(ri(xj)/\sqrt{c})=1-erf(\sqrt{c})
1-erf(\sqrt{c})
1-erf(\sqrt{c})
The final classifier is a linear combination of weak hypotheses and is evaluated in the same manner as most other boosting algorithms.
Input:
m
(x1,y1),\ldots,(xm,ym)
xj\inX,yj\inY=\{-1,+1\}
c
Initialise:
s=c
s
ri(xj)=0
\forallj
ri(xj)
i
xj
While
s>0
Wi(xj)=
| ||||||||||||||||||
e |
ri(xj)
xj
hi:X\to\{-1,+1\}
\sumjWi(xj)hi(xj)yj>0
\alpha,t
\sumjhi(xj)yj
| ||||
e |
=0
E | |
Wi+1 |
[hi(xj)yj]=0
Wi+1=\exp\left(
… | |
… |
\right)
E | |
Wi+1 |
[hi(xj)yj]=0
\sum\left(\Phi\left(ri(xj)+\alphah(xj)yj+s-t\right)-\Phi\left(ri(xj)+s\right)\right)=0
\Phi(z)=1-erf(z/\sqrt{c})
ri(xj)
ri+1(xj)=ri(xj)+\alphah(xj)yj
s=s-t
Output:
H(x)=rm{sign}\left(\sumi\alphaihi(x)\right)
In preliminary experimental results with noisy datasets, BrownBoost outperformed AdaBoost's generalization error; however, LogitBoost performed as well as BrownBoost.[4] An implementation of BrownBoost can be found in the open source software JBoost.