In statistics, Cramér's V (sometimes referred to as Cramér's phi and denoted as φc) is a measure of association between two nominal variables, giving a value between 0 and +1 (inclusive). It is based on Pearson's chi-squared statistic and was published by Harald Cramér in 1946.[1]
φc is the intercorrelation of two discrete variables[2] and may be used with variables having two or more levels. φc is a symmetrical measure: it does not matter which variable we place in the columns and which in the rows. Also, the order of rows/columns does not matter, so φc may be used with nominal data types or higher (notably, ordered or numerical).
Cramér's V varies from 0 (corresponding to no association between the variables) to 1 (complete association) and can reach 1 only when each variable is completely determined by the other. It may be viewed as the association between two variables as a percentage of their maximum possible variation.
φc2 is the mean square canonical correlation between the variables.
In the case of a 2 × 2 contingency table Cramér's V is equal to the absolute value of Phi coefficient.
Let a sample of size n of the simultaneously distributed variables
A
B
i=1,\ldots,r;j=1,\ldots,k
nij=
(Ai,Bj)
The chi-squared statistic then is:
2=\sum | |
\chi | |
i,j |
| ||||||||||
|
,
where
ni.=\sumjnij
Ai
n.j=\suminij
Bj
Cramér's V is computed by taking the square root of the chi-squared statistic divided by the sample size and the minimum dimension minus 1:
V=\sqrt{
\varphi2 | |
min(k-1,r-1) |
\varphi
\chi2
n
k
r
The p-value for the significance of V is the same one that is calculated using the Pearson's chi-squared test.
The formula for the variance of V=φc is known.[3]
In R, the function cramerV
from the package rcompanion
[4] calculates V using the chisq.test function from the stats package. In contrast to the function cramersV
from the lsr
[5] package, cramerV
also offers an option to correct for bias. It applies the correction described in the following section.
Cramér's V can be a heavily biased estimator of its population counterpart and will tend to overestimate the strength of association. A bias correction, using the above notation, is given by[6]
\tildeV=\sqrt{
\tilde\varphi2 | |
min(\tildek-1,\tilder-1) |
\tilde\varphi2=max\left(0,\varphi2-
(k-1)(r-1) | |
n-1 |
\right)
\tildek=k-
(k-1)2 | |
n-1 |
\tilder=r-
(r-1)2 | |
n-1 |
\tildeV
| ||||
E[\varphi |
Other measures of correlation for nominal data:
Other related articles: