The usual procedure would be to test whether the data D are consistent with models M1, M2 and/or M3.
But we could also go about as follows:
Generate data according to the models M1, M2 and M3, resulting in three data sets D1, D2 and D3 and test whether these data sets could have resulted from the same population as D.
Remarks:
- The above is only possible if we have a strong theory T on which we can base our models beforehand.
- An methodological advantage is that the researcher is forced to formulate his/her theoretical concepts and translate them into models before (s)he starts his or her experiment.
- A second advantage is that deviations between D and Di (i= 1, 2, 3) give information both on the relationships between variables and on the influence of the underlying (possibly multivariate) distributions (this is assuming that our models are based on known theoretical distributions like the normal distribution, which is common practice).
- A third advantage seems to be that we can directly test the alternative hypothesis.
- This procedure can not be combined with crossvalidation (randomly splitting the data in two parts, one part to find models consistent with the data, another part to test those models), because in the first part, models are formulated that are consistent with (possibly multivariate) distribution violations in the data: the same violations are present in the second part of the data, too.
Questions:
- Does a weak theory simply translates into a larger set of models?
- Simulating data based on models M1, M2 and M3 may not be trivial. Can we use similar procedures as are used in MCMC (Markov chain Monte Carlo) ?
- Can we use a Bayesian perspective, for instance by assuming that D1, D2 and D3 are based on prior distributions for the data D?
- Is the above approach known and described in the `simulation community'?