Tschuprow's T Explained

In statistics, Tschuprow's T is a measure of association between two nominal variables, giving a value between 0 and 1 (inclusive). It is closely related to Cramér's V, coinciding with it for square contingency tables.It was published by Alexander Tschuprow (alternative spelling: Chuprov) in 1939.[1]

Definition

For an r × c contingency table with r rows and c columns, let

\piij

be the proportion of the population in cell

(i,j)

and let

\pii+

c\pi
=\sum
ij
and

\pi+j

r\pi
=\sum
ij

.

Then the mean square contingency is given as

\phi2=

c(\piij-\pii+\pi+j)2
\pii+\pi+j
\sum
j=1

,

and Tschuprow's T as

T=\sqrt{

\phi2
\sqrt{(r-1)(c-1)
}} .

Properties

T equals zero if and only if independence holds in the table, i.e., if and only if

\piij=\pii+\pi+j

. T equals one if and only there is perfect dependence in the table, i.e., if and only if for each i there is only one j such that

\piij>0

and vice versa. Hence, it can only equal 1 for square tables. In this it differs from Cramér's V, which can be equal to 1 for any rectangular table.

Estimation

If we have a multinomial sample of size n, the usual way to estimate T from the data is via the formula

\hatT=\sqrt{

c(pij-pi+p+j)2
pi+p+j
\sum
j=1
\sqrt{(r-1)(c-1)
} },

where

pij=nij/n

is the proportion of the sample in cell

(i,j)

. This is the empirical value of T. With

\chi2

the Pearson chi-square statistic, this formula can also be written as

\hatT=\sqrt{

\chi2/n
\sqrt{(r-1)(c-1)
} } .

See also

Other measures of correlation for nominal data:

Other related articles:

References

Notes and References

  1. Tschuprow, A. A. (1939) Principles of the Mathematical Theory of Correlation; translated by M. Kantorowitsch. W. Hodge & Co.