Thursday, April 26, 2007

Small sample SEM

An issue that came up during a session on applications of structural equation modelling (SEM) after a lecture by David Kaplan at the Colloquium `Advising on research methods' in Amsterdam (29-30 March) was the fact that SEM is less often applied in medical/epidemiological research than it could be. Several bottlenecks can be (and were) identified:

  1. The concept of `latent variable' (and, relatedly, the concept of (substantive) theory and corresponding model) seems to be less easy to conceptualize in Medicine than in the social/behavioral sciences.
  2. Jules Ellis mentioned that SEM software is not easily optainable and costly and that specifying a SEM model in standard software is often awkward and difficult to achieve by the medical researcher/analyst.
  3. Most SEM models require sample sizes that are too large for most small- to moderately- sized medical/epidemiological research projects.

Comments:

Ad 1. (The concept of `latent variable' seems to be less easy to conceptualize in Medicine than in the social/behavioral sciences).

Of course, this is an interesting but more or less philosophical point, if phrased in this way. In real-life medical/epidemiological research there often seem to be no compelling reasons to take a theoretical stand to start from. I won't go into the reasons for this.

However, there are several situations in which a theoretical conceptualization could help to specify a SEM model that is more appropriate than the regression-like models that are frequently used. For instance,

  • In what is called `fundamental' or `basic' medical research, where complicated dynamic mechanisms are studied (think of genome studies or fysiological studies of illness progression), dynamic SEM models could be used to specify the dynamic process.
  • In questionnaire design, confirmatory factor analysis can be used to analyze the factor structure, just as it is done in social science research (see de Vet et al. (2005)).

Ad 2 (SEM software is costly and specifying a SEM model in standard software is too difficult for the medical researcher/analyst).

In an e-mail afterwards, David Kaplan mentioned that a package to perform SEM modelling exists in R+ (a statistical package related to Splus, freely obtainable via the Internet, see, f.i.: http://cran.nedmirror.nl/). It turned out to be written by John Fox. Sources and binaries for SEM can be obtained at: http://finzi.psych.upenn.edu/R/library/sem/html/00Index.html

As to the other point, that the syntax of model specifation would be too complicated for the researcher-in-the-field: after all, many of them have also been able to learn how to use multilevel analysis. So, where there is a will, there is a way, even in learning how to use SEM, and in particular if there would be an clear need.

Ad 3 (Most SEM models require sample sizes that are too large).

To me, this seems the most important bottleneck for standard application in Medicine/Epidemiology. If SEM requires to have sample sizes of at least 500, application in most medical studies are out of the question.

In his comments, Jelte Wicherts mentioned that small sample size techniques are being studied. Afterwards, in some e-mail exchanges, I suggested that resampling techniques like bootstrapping could be applied in small sample situations. As it turns out, Fox's package, mentioned above, also contains a bootstrapping possibility (but see Kaplan's book chapter 5, where arguments are given why sample sizes would increase when non-normality has to be assumed).

Comment by David Kaplan:

I'm not sure what you mean by:


(but see Kaplan's book chapter 5, where arguments are given why sample sizes would increase when non-normality has to be assumed).

I think you mean that when estimating models with non-normal observed variables, larger sample sizes are typically needed for estimators to behave properly. That is partly true, and was true in the good old days. But now, there are estimators that don't require huge sample sizes. Also, I believe there are bootstrapping approaches to get standard errors when sample sizes are a bit smaller.

References

de Vet, H. C., Ader, H. J., Terwee, C. B., & Pouwer, F. (2005). Are factor analytical techniques used appropriately in the validation of health status questionnaires? A systematic review on the quality of factor analys of the SF-36. Quality of life research, 14(5), 1203–1218.

Kaplan, D. (2000). Structural Equation Modeling. Foundations and Extensions. Thousand Oaks London New Delhi: Sage Publications.

1 comment:

Anonymous said...

Good words.