In statistics, qualitative comparative analysis (QCA) is a data analysis based on set theory to examine the relationship of conditions to outcome. QCA describes the relationship in terms of necessary conditions and sufficient conditions.[1] The technique was originally developed by Charles Ragin in 1987[2] to study data sets that are too small for linear regression analysis but large for cross-case analysis.[3]
In the case of categorical variables, QCA begins by listing and counting all types of cases which occur, where each type of case is defined by its unique combination of values of its independent and dependent variables. For instance, if there were four categorical variables of interest,, and A and B were dichotomous (could take on two values), C could take on five values, and D could take on three, then there would be 60 possible types of observations determined by the possible combinations of variables, not all of which would necessarily occur in real life. By counting the number of observations that exist for each of the 60 unique combination of variables, QCA can determine which descriptive inferences or implications are empirically supported by a data set. Thus, the input to QCA is a data set of any size, from small-N to large-N, and the output of QCA is a set of descriptive inferences or implications the data supports.
In QCA's next step, inferential logic or Boolean algebra is used to simplify or reduce the number of inferences to the minimum set of inferences supported by the data. This reduced set of inferences is termed the "prime implicates" by QCA adherents. For instance, if the presence of conditions A and B is always associated with the presence of a particular value of D, regardless of the observed value of C, then the value that C takes is irrelevant. Thus, all five inferences involving A and B and any of the five values of C may be replaced by the single descriptive inference "(A and B) implies the particular value of D".
To establish that the prime implicants or descriptive inferences derived from the data by the QCA method are causal requires establishing the existence of causal mechanism using another method such as process tracing, formal logic, intervening variables, or established multidisciplinary knowledge.[4] The method is used in social science and is based on the binary logic of Boolean algebra, and attempts to ensure that all possible combinations of variables that can be made across the cases under investigation are considered.
The technique of listing case types by potential variable combinations assists with case selection by making investigators aware of all possible case types that would need to be investigated, at a minimum, if they exist, in order to test a certain hypothesis or to derive new inferences from an existing data set. In situations where the available observations constitute the entire population of cases, this method alleviates the small N problem by allowing inferences to be drawn by evaluating and comparing the number of cases exhibiting each combination of variables. The small N problem arises when the number of units of analysis (e.g. countries) available is inherently limited. For example: a study where countries are the unit of analysis is limited in that are only a limited number of countries in the world (less than 200), less than necessary for some (probabilistic) statistical techniques. By maximizing the number of comparisons that can be made across the cases under investigation, causal inferences are according to Ragin possible.[5] This technique allows the identification of multiple causal pathways and interaction effects that may not be detectable via statistical analysis that typically requires its data set to conform to one model. Thus, it is the first step to identifying subsets of a data set conforming to particular causal pathway based on the combinations of covariates prior to quantitative statistical analyses testing conformance to a model; and helps qualitative researchers to correctly limit the scope of claimed findings to the type of observations they analyze.
As this is a logical (deterministic) and not a statistical (probabilistic) technique, with "crisp-set" QCA (csQCA), the original application of QCA, variables can only have two values, which is problematic as the researcher has to determine the values of each variable. For example: GDP per capita has to be divided by the researcher in two categories (e.g. low = 0 and high = 1). But as this variable is essentially a continuous variable, the division will always be arbitrary. A second, related problem is that the technique does not allow an assessment of the effect of the relative strengths of the independent variables (as they can only have two values).[5] Ragin, and other scholars such as Lasse Cronqvist, have tried to deal with these issues by developing new tools that extend QCA, such as multi-value QCA (mvQCA) and fuzzy set QCA (fsQCA). Note: Multi-value QCA is simply QCA applied to observations having categorical variables with more than two values. Crisp-Set QCA can be considered a special case of Multi-value QCA.
Statistical methodologists have argued that QCA's strong assumptions render its findings both fragile and prone to type I error. Simon Hug argues that deterministic hypotheses and error-free measures are exceedingly rare in social science and uses Monte Carlo simulations to demonstrate the fragility of QCA results if either assumption is violated.[6] Chris Krogslund, Donghyun Danny Choi, and Mathias Poertner further demonstrate that QCA results are highly sensitive to minor parametric and model-susceptibility changes and are vulnerable to type I error.[7] Bear F. Braumoeller further explores the vulnerability of the QCA family of techniques to both type I error and multiple inference.[8] Braumoeller also offers a formal test of the null hypothesis and demonstrates that even very convincing QCA findings may be the result of chance.[9]
QCA can be performed probabilistically or deterministically with observations of categorical variables. For instance, the existence of a descriptive inference or implication is supported deterministically by the absence of any counter-example cases to the inference; i.e. if a researcher claims condition X implies condition Y, then, deterministically, there must not exist any counterexample cases having condition X, but not condition Y. However, if the researcher wants to claim that condition X is a probabilistic 'predictor' of condition Y, in another similar set of cases, then the proportion of counterexample cases to an inference to the proportion of cases having that same combination of conditions can be set at a threshold value of for example 80% or higher. For each prime implicant that QCA outputs via its logical inference reduction process, the "coverage" — percentage out of all observations that exhibit that implication or inference — and the "consistency" — the percentage of observations conforming to that combination of variables having that particular value of the dependent variable or outcome — are calculated and reported, and can be used as indicators of the strength of such an explorative probabilistic inference. In real-life complex societal processes, QCA enables the identification of multiple sets of conditions that are consistently associated with a particular output value in order to explore for causal predictors.
Fuzzy set QCA aims to handle variables, such as GDP per capita, where the number of categories, decimal values of monetary units, becomes too large to use mvQCA, or in cases were uncertainty or ambiguity or measurement error in the classification of a case needs to be acknowledged.
QCA has now become used in many more fields than political science which Ragin first developed the method for.[10] Today the method has been used in: