Cramér's V Explained

In statistics, Cramér's V (sometimes referred to as Cramér's phi and denoted as φ_c) is a measure of association between two nominal variables, giving a value between 0 and +1 (inclusive). It is based on Pearson's chi-squared statistic and was published by Harald Cramér in 1946.^[1]

Usage and interpretation

φ_c is the intercorrelation of two discrete variables^[2] and may be used with variables having two or more levels. φ_c is a symmetrical measure: it does not matter which variable we place in the columns and which in the rows. Also, the order of rows/columns does not matter, so φ_c may be used with nominal data types or higher (notably, ordered or numerical).

Cramér's V varies from 0 (corresponding to no association between the variables) to 1 (complete association) and can reach 1 only when each variable is completely determined by the other. It may be viewed as the association between two variables as a percentage of their maximum possible variation.

φ_c² is the mean square canonical correlation between the variables.

In the case of a 2 × 2 contingency table Cramér's V is equal to the absolute value of Phi coefficient.

Calculation

Let a sample of size n of the simultaneously distributed variables

and

for

i=1,\ldots,r;j=1,\ldots,k

be given by the frequencies

n_ij=

number of times the values

(A_i,B_j)

were observed.

The chi-squared statistic then is:

	2=\sum
\chi
	i,j

-	n_i.n_.j
	n

)²

	n_i.n_.j
	n

where

n_i.=\sum_jn_ij

is the number of times the value

A_i

is observed and

n_.j=\sum_in_ij

is the number of times the value

B_j

is observed.

Cramér's V is computed by taking the square root of the chi-squared statistic divided by the sample size and the minimum dimension minus 1:

V=\sqrt{

	\varphi²
	min(k-1,r-1)

} = \sqrt\;,where:

\varphi

is the phi coefficient.

\chi²

is derived from Pearson's chi-squared test

is the grand total of observations and

being the number of columns.

being the number of rows.

The p-value for the significance of V is the same one that is calculated using the Pearson's chi-squared test.

The formula for the variance of V=φ_c is known.^[3]

In R, the function cramerV from the package rcompanion^[4] calculates V using the chisq.test function from the stats package. In contrast to the function cramersV from the lsr^[5] package, cramerV also offers an option to correct for bias. It applies the correction described in the following section.

Bias correction

Cramér's V can be a heavily biased estimator of its population counterpart and will tend to overestimate the strength of association. A bias correction, using the above notation, is given by^[6]

\tildeV=\sqrt{

	\tilde\varphi²
	min(\tildek-1,\tilder-1)

} where

\tilde\varphi²=max\left(0,\varphi²-

	(k-1)(r-1)
	n-1

\right)

and

\tildek=k-

	(k-1)²
	n-1

\tilder=r-

	(r-1)²
	n-1

Then

\tildeV

estimates the same population quantity as Cramér's V but with typically much smaller mean squared error. The rationale for the correction is that under independence,

2]=	(k-1)(r-1)
	n-1

E[\varphi

.^[7]

External links

A Measure of Association for Nonparametric Statistics (Alan C. Acock and Gordon R. Stavig Page 1381 of 1381–1386)
Nominal Association: Phi and Cramer's Vl from the homepage of Pat Dattalo.

Notes and References

Cramér, Harald. 1946. Mathematical Methods of Statistics. Princeton: Princeton University Press, page 282 (Chapter 21. The two-dimensional case). (table of content)
Sheskin, David J. (1997). Handbook of Parametric and Nonparametric Statistical Procedures. Boca Raton, Fl: CRC Press.
Liebetrau, Albert M. (1983). Measures of association. Newbury Park, CA: Sage Publications. Quantitative Applications in the Social Sciences Series No. 32. (pages 15–16)
Web site: Rcompanion: Functions to Support Extension Education Program Evaluation. 2019-01-03.
Web site: Lsr: Companion to "Learning Statistics with R". 2015-03-02.
Bergsma . Wicher . A bias correction for Cramér's V and Tschuprow's T . Journal of the Korean Statistical Society . 42 . 2013 . 3 . 323–328 . 10.1016/j.jkss.2012.10.002 .
Bartlett . Maurice S. . 1937 . Properties of Sufficiency and Statistical Tests . Proceedings of the Royal Society of London . Series A . 160 . 901 . 268–282 . 96803 . 10.1098/rspa.1937.0109 . 1937RSPSA.160..268B .
Tyler . Scott R. . Bunyavanich . Supinda . Schadt . Eric E. . 2021-11-19 . PMD Uncovers Widespread Cell-State Erasure by scRNAseq Batch Correction Methods . BioRxiv . en . 2021.11.15.468733 . 10.1101/2021.11.15.468733.