Chi-square automatic interaction detection explained

Chi-square automatic interaction detection (CHAID)[1] [2] is a decision tree technique based on adjusted significance testing (Bonferroni correction, Holm-Bonferroni testing). The technique was developed in South Africa in 1975 and was published in 1980 by Gordon V. Kass, who had completed a PhD thesis on this topic. CHAID can be used for prediction (in a similar fashion to regression analysis, this version of CHAID being originally known as XAID) as well as classification, and for detection of interaction between variables. CHAID is based on a formal extension of AID (Automatic Interaction Detection)[3] and THAID (THeta Automatic Interaction Detection)[4] [5] procedures of the 1960s and 1970s, which in turn were extensions of earlier research, including that performed by Belson in the UK in the 1950s.[6] A history of earlier supervised tree methods together with a detailed description of the original CHAID algorithm and the exhaustive CHAID extension by Biggs, De Ville, and Suen, can be found in Ritschard.[7]

In practice, CHAID is often used in the context of direct marketing to select groups of consumers to predict how their responses to some variables affect other variables, although other early applications were in the fields of medical and psychiatric research.

Like other decision trees, CHAID's advantages are that its output is highly visual and easy to interpret. Because it uses multiway splits by default, it needs rather large sample sizes to work effectively, since with small sample sizes the respondent groups can quickly become too small for reliable analysis.

One important advantage of CHAID over alternatives such as multiple regression is that it is non-parametric.

See also

Further reading

Software

Notes and References

  1. Kass . G. V. . 1980 . An Exploratory Technique for Investigating Large Quantities of Categorical Data . Applied Statistics . 29 . 2 . 119–127 . 10.2307/2986296. 2986296 .
  2. Biggs . David . De Ville . Barry . Suen . Ed . 1991 . A method of choosing multiway partitions for classification and decision trees . Journal of Applied Statistics . en . 18 . 1 . 49–62 . 10.1080/02664769100000005 . 0266-4763.
  3. Morgan . James N. . Sonquist . John A. . 1963 . Problems in the Analysis of Survey Data, and a Proposal . Journal of the American Statistical Association . en . 58 . 302 . 415–434 . 10.1080/01621459.1963.10500855 . 0162-1459.
  4. Messenger . Robert . Mandell . Lewis . 1972 . A Modal Search Technique for Predictive Nominal Scale Multivariate Analysis . Journal of the American Statistical Association . en . 67 . 340 . 768–772 . 10.1080/01621459.1972.10481290 . 0162-1459.
  5. Book: Morgan, James N. . THAID, a sequential analysis program for the analysis of nominal scale dependent variables . 1973 . Robert C. Messenger . 0-87944-137-2 . Ann Arbor, Mich. . 666930.
  6. Belson . William A. . 1959 . Matching and Prediction on the Principle of Biological Classification . Applied Statistics . 8 . 2 . 65–75 . 10.2307/2985543. 2985543 .
  7. Ritschard . Gilbert . CHAID and Earlier Supervised Tree Methods . Contemporary Issues in Exploratory Data Mining in the Behavioral Sciences, McArdle, J.J. And G. Ritschard (Eds) . New York . Routledge . 2013 . 48–74.