Thursday, April 26, 2007

Small sample SEM

An issue that came up during a session on applications of structural equation modelling (SEM) after a lecture by David Kaplan at the Colloquium `Advising on research methods' in Amsterdam (29-30 March) was the fact that SEM is less often applied in medical/epidemiological research than it could be. Several bottlenecks can be (and were) identified:

  1. The concept of `latent variable' (and, relatedly, the concept of (substantive) theory and corresponding model) seems to be less easy to conceptualize in Medicine than in the social/behavioral sciences.
  2. Jules Ellis mentioned that SEM software is not easily optainable and costly and that specifying a SEM model in standard software is often awkward and difficult to achieve by the medical researcher/analyst.
  3. Most SEM models require sample sizes that are too large for most small- to moderately- sized medical/epidemiological research projects.

Comments:

Ad 1. (The concept of `latent variable' seems to be less easy to conceptualize in Medicine than in the social/behavioral sciences).

Of course, this is an interesting but more or less philosophical point, if phrased in this way. In real-life medical/epidemiological research there often seem to be no compelling reasons to take a theoretical stand to start from. I won't go into the reasons for this.

However, there are several situations in which a theoretical conceptualization could help to specify a SEM model that is more appropriate than the regression-like models that are frequently used. For instance,

  • In what is called `fundamental' or `basic' medical research, where complicated dynamic mechanisms are studied (think of genome studies or fysiological studies of illness progression), dynamic SEM models could be used to specify the dynamic process.
  • In questionnaire design, confirmatory factor analysis can be used to analyze the factor structure, just as it is done in social science research (see de Vet et al. (2005)).

Ad 2 (SEM software is costly and specifying a SEM model in standard software is too difficult for the medical researcher/analyst).

In an e-mail afterwards, David Kaplan mentioned that a package to perform SEM modelling exists in R+ (a statistical package related to Splus, freely obtainable via the Internet, see, f.i.: http://cran.nedmirror.nl/). It turned out to be written by John Fox. Sources and binaries for SEM can be obtained at: http://finzi.psych.upenn.edu/R/library/sem/html/00Index.html

As to the other point, that the syntax of model specifation would be too complicated for the researcher-in-the-field: after all, many of them have also been able to learn how to use multilevel analysis. So, where there is a will, there is a way, even in learning how to use SEM, and in particular if there would be an clear need.

Ad 3 (Most SEM models require sample sizes that are too large).

To me, this seems the most important bottleneck for standard application in Medicine/Epidemiology. If SEM requires to have sample sizes of at least 500, application in most medical studies are out of the question.

In his comments, Jelte Wicherts mentioned that small sample size techniques are being studied. Afterwards, in some e-mail exchanges, I suggested that resampling techniques like bootstrapping could be applied in small sample situations. As it turns out, Fox's package, mentioned above, also contains a bootstrapping possibility (but see Kaplan's book chapter 5, where arguments are given why sample sizes would increase when non-normality has to be assumed).

Comment by David Kaplan:

I'm not sure what you mean by:


(but see Kaplan's book chapter 5, where arguments are given why sample sizes would increase when non-normality has to be assumed).

I think you mean that when estimating models with non-normal observed variables, larger sample sizes are typically needed for estimators to behave properly. That is partly true, and was true in the good old days. But now, there are estimators that don't require huge sample sizes. Also, I believe there are bootstrapping approaches to get standard errors when sample sizes are a bit smaller.

References

de Vet, H. C., Ader, H. J., Terwee, C. B., & Pouwer, F. (2005). Are factor analytical techniques used appropriately in the validation of health status questionnaires? A systematic review on the quality of factor analys of the SF-36. Quality of life research, 14(5), 1203–1218.

Kaplan, D. (2000). Structural Equation Modeling. Foundations and Extensions. Thousand Oaks London New Delhi: Sage Publications.

Wednesday, April 4, 2007

Colloquium `Advising on research methods'

This colloquium was organised by Don Mellenbergh and myself and held on March 29-30 in Amsterdam, the Netherlands.

Speakers were :


  • Janice Derr (Advising in a multi-disciplinary setting)
  • Steven Piantadosi (Research designs in Medicine)
  • Don Mellenbergh (Advising on test construction)
  • Gerald van Belle (Statistics and everyday life)
  • Jules Ellis (Advising to policy makers in Health Care)
  • Bo Lu (Bias correction using propensity scores)
  • Paul de Boeck (Consulting in behaviour research)
  • Willem Heiser (Survival skills in publishing)
  • Robert Pool (Combining qualitative and quantitative methods)
  • David Kaplan (Research problem and structural equation model)
  • Denny Borsboom (Advising on test validity)
  • Herman Adèr (Time and strategy)

The colloquium was preceded by a masterclass on Wednesday the 28st.

For more details, see: http://www.knaw.nl/colloquia/advising/
In the next few posts, I will describe some of the topics that were discussed.

Herman