There are many different reasons for doing an experiment. Some of these are listed below. The purpose of the experiment will determine sample size and design.
Some types of experiment:
Experiments to estimate parameters such as means, proportions or dose-responses
Experiments to optimise subsequent experiments
Surveys, correlative studies and epidemiological investigations
A mutant dwarf mouse and litter mate
These are small, usually short-term, experiments which are used to test the logistics of a proposed study, to set dose levels where necessary and to gain preliminary information. They usually involve small numbers of animals (often between one and 20) but there is no formal method of assessing the appropriate number. The results will not normally be presented in a scientific paper but are used in planning future studies.
These are used to study the patterns of response to some treatment or intervention, without necessarily being based on a formal hypothesis, and may be used to generate hypotheses for more formal testing in “confirmatory” experiments (see below).
Often many characters will be measured and subjected to a statistical analysis, using conventional methods. But multiple statistical testing means that the resulting p-values (probability that an observed response could have arisen by chance) may be inaccurate leading to false positive results.
A DNA micro-array experiment which collects data on the activity thousands of genes per animal is usually of an exploratory type. It is unlikely to answer many questions but will usually generate several hypotheses requiring further investigation. Expert advice is often needed in the analysis of the resulting data.
This is the most common type of experiment. It is used to test a clearly stated hypothesis, which should be stated before starting the experiment. Usually this is the null hypothesis that there is no difference between treatment means or proportions, with an alternative hypothesis being that there is a difference.
Preferably the hypothesis should be quite simple, involving a single or just a few outcomes.
The results will be analysed to obtain a p-value (the probability that a result as extreme as or more extreme than the one observed could have arisen just by chance). A critical value for p is chosen (usually 0.05) with rejection of the null hypothesis if the observed p-value is less than this.
Failure to reject the null hypothesis (“the results are not statistically significant”) does not mean that the treatment has no effect, only that from this experiment there is no evidence of an effect. Had the experiment been larger or the material more uniform an effect might have been detected. The power analysis method of sample size determination usually assumes a confirmatory experiment (see 13.Sample size).
Sometimes it is obvious that there will be a response to a treatment, but the aim of the experiment is to estimate the magnitude of that response. The aim is therefore to estimate one or more parameters such as means or differences between means, proportions or dose-response relationships. In this case the required sample size will depend on how accurately the parameter needs to be estimated (see Sample size)
Very similar experiments are sometimes repeated many times. For example, a pharmaceutical company may test a whole series of compounds in an animal model. Optimising such experiments with respect to age, genotype, sex, prior treatment and timing of observations may lead to substantial savings of animals and money. Typically such experiments will examine the effect of several variables at once using factorial experimental designs (see also Experimental designs/Factorials).
These are done when it is not possible to manipulate the experimental material so as to impose different treatments. For example, an experiment to determine the effects of smoking on human health is not possible because people can not be assigned at random to smoking and non-smoking groups. All that can be achieved is an estimate of any association between the two.
The main problem with surveys is that the groups being compared may not be exactly comparable in other unsuspected ways. There may, for example, be some personality traits that lead both to smoking and ill health. In this particular example there is now sufficient weight of evidence to make us reasonably sure that smoking is a major cause of ill health, but possible confounding variables always need to be taken into account.
Some surveys have the appearance of true experiments. A laboratory study of differences in learning ability in different strains of rats is a survey rather than a true experiment because “strain” can not be assigned at random to different individuals. Special care is needed when studying “classification variables” such as sex, strain and (in some cases) age, to ensure that groups do not differ in other ways.