In statistics, confirmatory factor analysis (CFA) is a special form of factor analysis, most commonly used in social science research.[1] It is used to test whether measures of a construct are consistent with a researcher's understanding of the nature of that construct (or factor). As such, the objective of confirmatory factor analysis is to test whether the data fit a hypothesized measurement model. This hypothesized model is based on theory and/or previous analytic research.[2] CFA was first developed by Jöreskog (1969)[3] and has built upon and replaced older methods of analyzing construct validity such as the MTMM Matrix as described in Campbell & Fiske (1959).[4]
In confirmatory factor analysis, the researcher first develops a hypothesis about what factors they believe are underlying the measures used (e.g., "Depression" being the factor underlying the Beck Depression Inventory and the Hamilton Rating Scale for Depression) and may impose constraints on the model based on these a priori hypotheses. By imposing these constraints, the researcher is forcing the model to be consistent with their theory. For example, if it is posited that there are two factors accounting for the covariance in the measures, and that these factors are unrelated to each other, the researcher can create a model where the correlation between factor A and factor B is constrained to zero. Model fit measures could then be obtained to assess how well the proposed model captured the covariance between all the items or measures in the model. If the constraints the researcher has imposed on the model are inconsistent with the sample data, then the results of statistical tests of model fit will indicate a poor fit, and the model will be rejected. If the fit is poor, it may be due to some items measuring multiple factors. It might also be that some items within a factor are more related to each other than others.
For some applications, the requirement of "zero loadings" (for indicators not supposed to load on a certain factor) has been regarded as too strict. A newly developed analysis method, "exploratory structural equation modeling", specifies hypotheses about the relation between observed indicators and their supposed primary latent factors while allowing for estimation of loadings with other latent factors as well.[5]
In confirmatory factor analysis, researchers are typically interested in studying the degree to which responses on a p x 1 vector of observable random variables can be used to assign a value to one or more unobserved variable(s)
\xi
\xi
Y=Λ\xi+\epsilon
where
Y
\xi
Λ
Y
\xi
\epsilon
FML=ln|Λ\OmegaΛ{'}+I-\operatorname{diag}(Λ\OmegaΛ{'})|+\operatorname{tr}(R(Λ\OmegaΛ{'}+I-\operatorname{diag}(Λ\OmegaΛ{'}))-1)-ln(R)-p
where
Λ\OmegaΛ{'}+I-\operatorname{diag}(Λ\OmegaΛ{'})
R
Although numerous algorithms have been used to estimate CFA models, maximum likelihood (ML) remains the primary estimation procedure.[7] That being said, CFA models are often applied to data conditions that deviate from the normal theory requirements for valid ML estimation. For example, social scientists often estimate CFA models with non-normal data and indicators scaled using discrete ordered categories.[8] Accordingly, alternative algorithms have been developed that attend to the diverse data conditions applied researchers encounter. The alternative estimators have been characterized into two general type: (1) robust and (2) limited information estimator.[9]
When ML is implemented with data that deviates away from the assumptions of normal theory, CFA models may produce biased parameter estimates and misleading conclusions.[10] Robust estimation typically attempts to correct the problem by adjusting the normal theory model χ2 and standard errors. For example, Satorra and Bentler (1994) recommended using ML estimation in the usual way and subsequently dividing the model χ2 by a measure of the degree of multivariate kurtosis.[11] An added advantage of robust ML estimators is their availability in common SEM software (e.g., LAVAAN).[12]
Unfortunately, robust ML estimators can become untenable under common data conditions. In particular, when indicators are scaled using few response categories (e.g., disagree, neutral, agree) robust ML estimators tend to perform poorly. Limited information estimators, such as weighted least squares (WLS), are likely a better choice when manifest indicators take on an ordinal form.[13] Broadly, limited information estimators attend to the ordinal indicators by using polychoric correlations to fit CFA models.[14] Polychoric correlations capture the covariance between two latent variables when only their categorized form is observed, which is achieved largely through the estimation of threshold parameters.[15]
See main article: article and Exploratory factor analysis. Both exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) are employed to understand shared variance of measured variables that is believed to be attributable to a factor or latent construct. Despite this similarity, however, EFA and CFA are conceptually and statistically distinct analyses.
The goal of EFA is to identify factors based on data and to maximize the amount of variance explained.[16] The researcher is not required to have any specific hypotheses about how many factors will emerge, and what items or variables these factors will comprise. If these hypotheses exist, they are not incorporated into and do not affect the results of the statistical analyses. By contrast, CFA evaluates a priori hypotheses and is largely driven by theory. CFA analyses require the researcher to hypothesize, in advance, the number of factors, whether or not these factors are correlated, and which items/measures load onto and reflect which factors.[17] As such, in contrast to exploratory factor analysis, where all loadings are free to vary, CFA allows for the explicit constraint of certain loadings to be zero.
EFA is often considered to be more appropriate than CFA in the early stages of scale development because CFA does not show how well your items load on the non-hypothesized factors.[18] Another strong argument for the initial use of EFA, is that the misspecification of the number of factors at an early stage of scale development will typically not be detected by confirmatory factor analysis. At later stages of scale development, confirmatory techniques may provide more information by the explicit contrast of competing factor structures.
EFA is sometimes reported in research when CFA would be a better statistical approach.[19] It has been argued that CFA can be restrictive and inappropriate when used in an exploratory fashion.[20] However, the idea that CFA is solely a “confirmatory” analysis may sometimes be misleading, as modification indices used in CFA are somewhat exploratory in nature. Modification indices show the improvement in model fit if a particular coefficient were to become unconstrained.[21] Likewise, EFA and CFA do not have to be mutually exclusive analyses; EFA has been argued to be a reasonable follow up to a poor-fitting CFA model.[22]
Structural equation modeling software is typically used for performing confirmatory factor analysis. LISREL,[23] EQS,[24] AMOS,[25] Mplus[26] and LAVAAN package in R[27] are popular software programs. There is also the Python package .[28] CFA is also frequently used as a first step to assess the proposed measurement model in a structural equation model. Many of the rules of interpretation regarding assessment of model fit and model modification in structural equation modeling apply equally to CFA. CFA is distinguished from structural equation modeling by the fact that in CFA, there are no directed arrows between latent factors. In other words, while in CFA factors are not presumed to directly cause one another, SEM often does specify particular factors and variables to be causal in nature. In the context of SEM, the CFA is often called 'the measurement model', while the relations between the latent variables (with directed arrows) are called 'the structural model'.
See also: Regression validation and Statistical model validation. In CFA, several statistical tests are used to determine how well the model fits to the data. Note that a good fit between the model and the data does not mean that the model is “correct”, or even that it explains a large proportion of the covariance. A “good model fit” only indicates that the model is plausible.[29] When reporting the results of a confirmatory factor analysis, one is urged to report: a) the proposed models, b) any modifications made, c) which measures identify each latent variable, d) correlations between latent variables, e) any other pertinent information, such as whether constraints are used.[30] With regard to selecting model fit statistics to report, one should not simply report the statistics that estimate the best fit, though this may be tempting. Though several varying opinions exist, Kline (2010) recommends reporting the chi-squared test, the root mean square error of approximation (RMSEA), the comparative fit index (CFI), and the standardised root mean square residual (SRMR).
Absolute fit indices determine how well the a priori model fits, or reproduces the data.[31] Absolute fit indices include, but are not limited to, the Chi-Squared test, RMSEA, GFI, AGFI, RMR, and SRMR.[32]
The chi-squared test indicates the difference between observed and expected covariance matrices. Values closer to zero indicate a better fit; smaller difference between expected and observed covariance matrices. Chi-squared statistics can also be used to directly compare the fit of nested models to the data. One difficulty with the chi-squared test of model fit, however, is that researchers may fail to reject an inappropriate model in small sample sizes and reject an appropriate model in large sample sizes. As a result, other measures of fit have been developed.
The root mean square error of approximation (RMSEA) avoids issues of sample size by analyzing the discrepancy between the hypothesized model, with optimally chosen parameter estimates, and the population covariance matrix. The RMSEA ranges from 0 to 1, with smaller values indicating better model fit. A value of .06 or less is indicative of acceptable model fit.[33] [34]
The root mean square residual (RMR) and standardized root mean square residual (SRMR) are the square root of the discrepancy between the sample covariance matrix and the model covariance matrix. The RMR may be somewhat difficult to interpret, however, as its range is based on the scales of the indicators in the model (this becomes tricky when you have multiple indicators with varying scales; e.g., two questionnaires, one on a 0–10 scale, the other on a 1–3 scale). The standardized root mean square residual removes this difficulty in interpretation, and ranges from 0 to 1, with a value of .08 or less being indicative of an acceptable model.[33]
The goodness of fit index (GFI) is a measure of fit between the hypothesized model and the observed covariance matrix. The adjusted goodness of fit index (AGFI) corrects the GFI, which is affected by the number of indicators of each latent variable. The GFI and AGFI range between 0 and 1, with a value of over .9 generally indicating acceptable model fit.[35]
Relative fit indices (also called “incremental fit indices”[36] and “comparative fit indices”[37]) compare the chi-square for the hypothesized model to one from a “null”, or “baseline” model. This null model almost always contains a model in which all of the variables are uncorrelated, and as a result, has a very large chi-square (indicating poor fit). Relative fit indices include the normed fit index and comparative fit index.
The normed fit index (NFI) analyzes the discrepancy between the chi-squared value of the hypothesized model and the chi-squared value of the null model.[38] However, NFI tends to be negatively biased. The non-normed fit index (NNFI; also known as the Tucker–Lewis index, as it was built on an index formed by Tucker and Lewis, in 1973[39]) resolves some of the issues of negative bias, though NNFI values may sometimes fall beyond the 0 to 1 range. Values for both the NFI and NNFI should range between 0 and 1, with a cutoff of .95 or greater indicating a good model fit.[40]
The comparative fit index (CFI) analyzes the model fit by examining the discrepancy between the data and the hypothesized model, while adjusting for the issues of sample size inherent in the chi-squared test of model fit, and the normed fit index. CFI values range from 0 to 1, with larger values indicating better fit. Previously, a CFI value of .90 or larger was considered to indicate acceptable model fit. However, recent studies have indicated that a value greater than .90 is needed to ensure that misspecified models are not deemed acceptable. Thus, a CFI value of .95 or higher is presently accepted as an indicator of good fit.
To estimate the parameters of a model, the model must be properly identified. That is, the number of estimated (unknown) parameters (q) must be less than or equal to the number of unique variances and covariances among the measured variables; p(p + 1)/2. This equation is known as the "t rule". If there is too little information available on which to base the parameter estimates, then the model is said to be underidentified, and model parameters cannot be estimated appropriately.[41]
. Jeffrey S. Tanaka . 1993 . Multifaceted conceptions of fit in structure equation models . K. A. . Bollen . J. S. . Long . Testing structural equation models . 136–162 . Newbury Park, CA . Sage . 0-8039-4506-X .