Monday, June 11, 2007

Paul de Boeck: Always do a PCA

Another of Paul's rules used during statistical consultation (rule 5) was: `Always do a PCA, it tells you about sources of differences in the data and about the interaction between the two modes of the data set', was met with scepticism, notably by David Kaplan, who said that he recommended clients never to use PCA.

Comments:
  • PCA is an abbreviation of `Principal component analysis'. It is essentially a data reduction technique requiring no assumptions about the distribution of the variables. In a nutshell, the technique results in a representation of the data relative an orthogonal coordinate system. Data reduction is obtained by considering only a few axes of the coordinate system.
  • Methodologically, the drawback of the technique lays in the orthogonality, which in most cases is not realistic in view of the substantive meaning of the data. To mend this, a promax rotation can be used which allows to obtain non-orthogonal axes.
  • As an alternative to PCA, confirmatory factor analysis (CFA) can be used in an exploratory way, in particular, if some assumtions can be made about the relation between factors and items (note that CFA does have several assumptions on the distribution of the observed variables, notably multinormality.)
  • Although Paul had to endure heavy critique on his fifthst rule, in my opinion he had a point. In fact, it is common practice among data analysts to use PCA as a quick and dirty technique to explore the data, even if they know how to apply CFA. If promax rotation is used instead of varimax rotation, some of the objections against the orthogonality assumption are mitigated, although not completely met: a CFA on the other half of the data using the factor structure found with PCA may result in completely different estimated angles between the factors.
  • For those who heard Paul's talk, the recommendation to use PCA was not supprising: he did put heavy emphasis on data exploration as an antidote to the often theory-centered approach that prevails in social science and behavioral research, and PCA can very well used in an exploratory way. As holds for all exploratoration, the truth is never ascertained. The analysis has to be confirmed either on a another part of the data or by doing a new, carefully designed experiment, that allows for unequivocal confirmation.
  • As a last remark, I want to stress the fact that the use of PCA is not so straightforward as it seems (in particular, if one wants to have some confidence in the results). For a recent article on PCA see Costello and Osborne (2005).

References

Costella, A. B. and J. W. Osborne (2005). Best Practices in Exploratory Factor Analysis: Four Recommendations for Getting the Most From YourAnalysis. Practical Assessment Research & Evaluation, 10 (7).

No comments: