Jackknife variance estimates for random forest explained

In statistics, jackknife variance estimates for random forest are a way to estimate the variance in random forest models, in order to eliminate the bootstrap effects.

Jackknife variance estimates

The sampling variance of bagged learners is:

V(x)=Var[\hat{\theta}^infty(x)]

Jackknife estimates can be considered to eliminate the bootstrap effects. The jackknife variance estimator is defined as:^[1]

\hat{V}_j=

	n-1
	n

	n
\sum
	i=1

(\hat\theta_(-i)-\overline\theta)²

In some classification problems, when random forest is used to fit models, jackknife estimated varianceis defined as:

\hat{V}_j=

	n-1
	n

	n
\sum
	i=1

(\overline

	\star
t
	(-i)

(x)-\overlinet^\star(x))²

Here,

t^\star

denotes a decision tree after training,

	\star
t
	(-i)

denotes the result based on samples without

ith

observation.

Examples

E-mail spam problem is a common classification problem, in this problem, 57 features are used to classify spam e-mail and non-spam e-mail. Applying IJ-U variance formula to evaluate the accuracy of models with m=15,19 and 57. The results shows in paper(Confidence Intervals for Random Forests: The jackknife and the Infinitesimal Jackknife) that m = 57 random forest appears to be quite unstable, while predictions made by m=5 random forest appear to be quite stable, this results is corresponding to the evaluation made by error percentage, in which the accuracy of model with m=5 is high and m=57 is low.

Here, accuracy is measured by error rate, which is defined as:

ErrorRate=

	1
	N

	N
\sum
	i=1

	M
\sum
	j=1

y_ij,

Here N is also the number of samples, M is the number of classes,

y_ij

is the indicator function which equals 1 when

ith

observation is in class j, equals 0 when in other classes. No probability is considered here. There is another method which is similar to error rate to measure accuracy:

logloss=

	1
	N

	N
\sum
	i=1

	M
\sum
	j=1

y_ijlog(p_ij)

Here N is the number of samples, M is the number of classes,

y_ij

is the indicator function which equals 1 when

ith

observation is in class j, equals 0 when in other classes.

p_ij

is the predicted probability of

ith

observation in class

.This method is used in Kaggle^[2] These two methods are very similar.

Modification for bias

When using Monte Carlo MSEs for estimating

	infty
V
	IJ

and

	infty
V
	J

, a problem about the Monte Carlo bias should be considered, especially when n is large, the bias is getting large:

	infty
E[\hat{V}		≈
	IJ

n\sum

	\star
(t
	b

-\bar{t

b=1

^\star

)^2}{B}

To eliminate this influence, bias-corrected modifications are suggested:

	B=
\hat{V}
	IJ-U

	B
\hat{V}
	IJ

n\sum

	\star
(t
	b

-\bar{t

b=1

^\star

)^2}{B}

	B=
\hat{V}
	J-U

	B
\hat{V}
	J

-(e-1)

n\sum

	\star
(t
	b

-\bar{t

b=1

^\star

)^2}{B}

Notes and References

Wager. Stefan. Hastie. Trevor. Efron. Bradley. Confidence Intervals for Random Forests: The Jackknife and the Infinitesimal Jackknife. Journal of Machine Learning Research. 2014-05-14. 15. 1. 1625–1651. 25580094. 4286302. 1311.4555. 2013arXiv1311.4555W.
Web site: Otto Group Product Classification Challenge . Kaggle.