Pseudo-R-squared explained

Pseudo-R-squared values are used when the outcome variable is nominal or ordinal such that the coefficient of determination ² cannot be applied as a measure for goodness of fit and when a likelihood function is used to fit a model.

In linear regression, the squared multiple correlation, ² is used to assess goodness of fit as it represents the proportion of variance in the criterion that is explained by the predictors.^[1] In logistic regression analysis, there is no agreed upon analogous measure, but there are several competing measures each with limitations.^[1] ^[2]

Four of the most commonly used indices and one less commonly used one are examined in this article:

Likelihood ratio ²
Cox and Snell ²
Nagelkerke ²
McFadden ²
Tjur ²

²_L by Cohen

²_L is given by Cohen:^[1]

	2
R
	L

	D_null-D_fitted
	D_null

This is the most analogous index to the squared multiple correlations in linear regression.^[3] It represents the proportional reduction in the deviance wherein the deviance is treated as a measure of variation analogous but not identical to the variance in linear regression analysis.^[3] One limitation of the likelihood ratio ² is that it is not monotonically related to the odds ratio,^[1] meaning that it does not necessarily increase as the odds ratio increases and does not necessarily decrease as the odds ratio decreases.

²_CS by Cox and Snell

²_CS is an alternative index of goodness of fit related to the ² value from linear regression.^[2] It is given by:

	2
\begin{align} R
	CS

&=1-\left(

	L₀
	L_M

\right)^2/n\\[5pt] &=1-\exp\left(

	2
	n

(ln(L₀₎-ln(L_M))\right) \end{align}

where and are the likelihoods for the model being fitted and the null model, respectively. The Cox and Snell index corresponds to the standard ² in case of a linear model with normal error. In certain situations, ²_CS may be problematic as its maximum value is

	2/n
L
	0

. For example, for logistic regression, the upper bound is

	2
R
	CS\leq0.75

for a symmetric marginal distribution of events and decreases further for an asymmetric distribution of events.

²_N by Nagelkerke

²_N, proposed by Nico Nagelkerke in a highly cited Biometrika paper^[4], provides a correction to the Cox and Snell ² so that the maximum value is equal to 1. Nevertheless, the Cox and Snell and likelihood ratio ²s show greater agreement with each other than either does with the Nagelkerke ².^[1] Of course, this might not be the case for values exceeding 0.75 as the Cox and Snell index is capped at this value. The likelihood ratio ² is often preferred to the alternatives as it is most analogous to ² in linear regression, is independent of the base rate (both Cox and Snell and Nagelkerke ²s increase as the proportion of cases increase from 0 to 0.5) and varies between 0 and 1.

² by McFadden

The pseudo ² by McFadden (sometimes called likelihood ratio index^[5]) is defined as

	2
R
	McF

=1-

	ln(L_M)
	ln(L₀₎

and is preferred over ² by Allison.^[2] The two expressions ² and ² are then related respectively by,

\begin{matrix}

	2
R
	CS

=1-

	2
2(R
	McF)

\left(\dfrac{1}{L

0}\right)

\ [1.5em]

	2
R
	McF

=-\dfrac{n}{2} ⋅ \dfrac{ln(1-

	2
R
	CS

)}{lnL_0}\end{matrix}

² by Tjur

Allison^[2] prefers ² which is a relatively new measure developed by Tjur.^[6] It can be calculated in two steps:

For each level of the dependent variable, find the mean of the predicted probabilities of an event.
Take the absolute value of the difference between these means

Interpretation

A word of caution is in order when interpreting pseudo-² statistics. The reason these indices of fit are referred to as pseudo ² is that they do not represent the proportionate reduction in error as the ² in linear regression does.^[1] Linear regression assumes homoscedasticity, that the error variance is the same for all values of the criterion. Logistic regression will always be heteroscedastic – the error variances differ for each value of the predicted score. For each value of the predicted score there would be a different value of the proportionate reduction in error. Therefore, it is inappropriate to think of ² as a proportionate reduction in error in a universal sense in logistic regression.^[1]

Notes and References

Book: Cohen . Jacob . Patricia . Cohen . Steven G. . West . Leona S. . Aiken . Leona S. Aiken . Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences . 3rd . Routledge . 2002 . 502 . 978-0-8058-2223-6 .
Web site: Measures of fit for logistic regression . Allison . Paul D. . Statistical Horizons LLC and the University of Pennsylvania.
Book: Menard , Scott W. . Applied Logistic Regression . 2nd . SAGE . 2002 . 978-0-7619-2208-7 .
Nagelkerke, N. J. D. (1991). A Note on a General Definition of the Coefficient of Determination. Biometrika, 78(3), 691–692. https://doi.org/10.2307/2337038
Hardin, J. W., Hilbe, J. M. (2007). Generalized linear models and extensions. USA: Taylor & Francis. Page 60, Google Books
Tjur . Tue . 2009 . Coefficients of determination in logistic regression models . American Statistician . 63 . 4 . 366–372 . 10.1198/tast.2009.08210 . 121927418 .

Pseudo-R-squared explained

2L by Cohen

2CS by Cox and Snell

2N by Nagelkerke

2 by McFadden

2 by Tjur