In statistics, Somers’ D, sometimes incorrectly referred to as Somer’s D, is a measure of ordinal association between two possibly dependent random variables and . Somers’ D takes values between
-1
1
Somers’ D plays a central role in rank statistics and is the parameter behind many nonparametric methods.[2] It is also used as a quality measure of binary choice or ordinal regression (e.g., logistic regressions) and credit scoring models.
We say that two pairs
(xi,yi)
(xj,yj)
xi>xj
yi>yj
xi<xj
yi<yj
(xi,yi)
(xj,yj)
xi>xj
yi<yj
xi<xj
yi>yj
xi=xj
yi=yj
Let
(x1,y1),(x2,y2),\ldots,(xn,yn)
\tau
\tau= | NC-ND |
n(n-1)/2 |
,
NC
ND
DYX=\tau(X,Y)/\tau(X,X)
As
\tau(X,X)
Let two independent bivariate random variables
(X1,Y1)
(X2,Y2)
\operatorname{P}XY
\operatorname{P}XY
\begin{align} \tau(X,Y)&=\operatorname{E}l(sgn(X1-X2)sgn(Y1-Y2)r)\\ &=\operatorname{P}l(sgn(X1-X2)sgn(Y1-Y2)=1r)-\operatorname{P}l(sgn(X1-X2)sgn(Y1-Y2)=-1r),\\ \end{align}
or the difference between the probabilities of concordance and discordance. Somers’ D of with respect to is defined as
DYX=\tau(X,Y)/\tau(X,X)
DYX
\tau(X,X)=1
If and are both binary with values 0 and 1, then Somers’ D is the difference between two probabilities:
DYX=\operatorname{P}(Y=1\midX=1)-\operatorname{P}(Y=1\midX=0).
In practice, Somers' D is most often used when the dependent variable Y is a binary variable,[2] i.e. for binary classification or prediction of binary outcomes including binary choice models in econometrics. Methods for fitting such models include logistic and probit regression.
Several statistics can be used to quantify the quality of such models: area under the receiver operating characteristic (ROC) curve, Goodman and Kruskal's gamma, Kendall's tau (Tau-a), Somers’ D, etc. Somers’ D is probably the most widely used of the available ordinal association statistics.[3] Identical to the Gini coefficient, Somers’ D is related to the area under the receiver operating characteristic curve (AUC),[2]
AUC= | DXY+1 |
2 |
In the case where the independent (predictor) variable is and the dependent (outcome) variable is binary, Somers’ D equals
DXY=
NC-ND | |
NC+ND+NT |
,
where
NT
Suppose that the independent (predictor) variable takes three values,,, or, and dependent (outcome) variable takes two values, or . The table below contains observed combinations of and :
The number of concordant pairs equals
NC=3 x 7+3 x 6+5 x 6=69.
ND=1 x 5+1 x 2+7 x 2=21.
NT=(3+5+2) x (1+7+6)-69-21=50
Thus, Somers’ D equals
DXY=
69-21 | |
69+21+50 |
≈ 0.34.