In these simple and flexible designs the experimental units (e.g. cages of animals, litters, or individual animals) are assigned to the treatments at random, regardless of their characteristics.
If the experimental material is heterogeneous (e.g. animals vary a lot in weight or age) a randomised block design might be better. If using mice or rats Isogenic should be used where possible, and all species should be free of disease and matched for age/weight. They should all have been reared under identical conditions, and once they have been assigned to their treatments they should housed, treated and measured in random order (i.e. not by treatment group).
Assuming a measurement outcome these designs are preferably analysed using a one-way analysis of variance (ANOVA) (provided the assumptions are met).
Two numerical examples are given with the data being analysed by an ANOVA. These involve real data with the usual complications.
- In the first example there are two outliers, with discussion about how these should be treated.
- The second example is of a survey of mouse strain susceptibility to the induction of lung tumours, with a discussion of the problems in sample size determination, and the need for a data transformation to compensate for unequal variation in each group.
The data in the table below came from a preliminary study as part of an investigation into the use of laser scanning cytometry in assessing the mouse micronucleus genotoxicity assay (Styles JA, Clark H, Festing MF, Rew DA. 2001. Cytometry 44:153-155).
Age-matched, SPF female BALB/c mice from a commercial company were acclimatised for two weeks in groups of four per cage and then assigned at random to treatment with urethane, 3-methylcholanthrene or saline (control) by intra-peritoneal injection. After an appropriate time the number of micronuclei was expressed as counts per 1000 erythrocytes. Animals were assessed in random order and the investigator was blind with respect to treatment.
In this case the aim is to see whether the mean micronucleus counts differ between the three groups. This can be assessed using a one-way analysis of variance followed by an appropriate post-hoc comparison to see which of the carcinogen treated groups differ from the control group. Note that the mouse is the experimental unit (since mice were individually treated) but the mice were housed four per cage. There may have been some cage effects, but these have been ignored in this analysis.
A dotplot showing individual observations is given below. From this, there does not appear to be much difference between groups 1 and 3, and group 2 seems to be more variable due largely to two outliers with very high counts.
Such outliers should always be checked against the original data in case they are due to transcription errors. In this case they are valid.
The problem with outliers is that they inflate the standard deviation, so reducing the power of the experiment, and may bias individual treatment means. One strategy for dealing with them is to do the analysis both with and
The next step it to do a preliminary ANOVA to see if the assumptions underlying the ANOVA are met. These assumptions are that the residuals (deviation of each observation from its group mean) have a normal distribution, and the variation is the same in each group. MINITAB version 14 produces Residual Plots as shown here which can be used to study these assumptions.without them to see if it makes any difference to the conclusions. For the moment they have been kept.
The Normal Probability Plot should give a straight line if the residuals have a normal distribution. In this case the two outliers show up clearly, and the residuals deviate slightly from a straight line, due largely to the two outliers. It is possible to do a formal statistical test of the normality of the residuals, but this is probably too sensitive. The ANOVA is quite tolerant of deviations from the two assumptions. The plot of Residuals Versus Fitted Values (top right) shows the residuals plotted individually against the fits (the group means). Again the two outliers show up clearly, with group 2 being a bit more variable than the other two groups. The histogram of the residuals is generally not very helpful with such small numbers, and the Residuals Versus Order plot should show no obvious pattern, as in this case.
An ANOVA and Dunnett’s test (the test appropriate for comparing treated means with a control mean) which includes the two outliers is given below. It shows that there are highly significant differences in mean micronucleus counts among the three groups (p<0.0005), but Dunnett’s test shows that whereas the controls differ from group 2 (urethane), they do not differ from group 3 at the 5% level of significance.
A re-analysis of the data excluding the two outliers (not shown here) results in exactly the same conclusions, so the two outliers can be retained in this case as removing them would make no difference to the conclusions.
One-way ANOVA: Micronuclei versus Group
Source DF SS MS F P
Group 2 22.196 11.098 14.00 0.000
Error 33 26.165 0.793
Total 35 48.361
Pooled StDev = 0.8904
Dunnett’s test is the post-hoc test appropriate for comparing the treatment means with a control mean. The output from MINITAB first gives the family error rate as 0.05, which means that a false positive result (claiming a difference due to a treatment when it is only due to chance) will be produce in only 5% of experiments. Because of multiple testing, this means that the individual error rate is set (by MINITAB) at 0.0272. The critical value is the amount by which means must differ to be significantly different at the family error rate of 0.05. The output then consists of the mean difference (Center) and the 95% confidence interval (CI) for that mean differences. If the CI does not span zero (i.e. the signs do not differ), then the difference is significant (as with level 2).
Dunnett’s comparisons with a control
Family error rate = 0.05
Individual error rate = 0.0272
Critical value = 2.31
Control = level (1) of Group
Confidence (95%)Intervals for treatment mean minus control mean
Level Lower Center Upper
2 0.9167 1.7567 2.5966
3 -0.6399 0.2000 1.0399
Numerical example 2
The aim of this “experiment” was to determine the susceptibility of a number of inbred strains and F1 hybrids of mice to the development of lung tumours following treatment with urethane. Susceptible and resistant strains were later used in research to identify quantitative trait loci controlling susceptibility and in studying the effect of anti-oxidants in preventing the development of cancer. It was already known that strain A/J is susceptible and C57BL/6 is resistant, and these two strains were included for comparative purposes (actually C57BL/6-+/Lprobheterozygous obese mice were used as these were more readily available, and assumed to be identical to C57BL/6 in this respect).
Although it has the appearance of being an experiment (and legally was an experiment), because it was conducted in the laboratory and involved treating mice with a carcinogen, it was really a statistical survey because “strain” can not be assigned at random to a mouse. And the purpose of the study was not to test hypotheses, but rather to characterise experimental material.
An appropriate way of determining sample size for a study of this sort is not readily available. Power analysis is not appropriate because the aim was not to test whether strains differ: it was already known that they do differ. The resource equation method suggests that three mice per strain would be adequate, but each strain mean would then be rather poorly estimated. Another complication was that it was already known that C57BL/6 mice would have a very low tumour count, and strain A/J a high count with a strong correlation between mean and standard deviation, this means that a pooled standard deviation based on the raw data would not be appropriate. Some of the more advanced statistical packages for power analysis will calculate sample size required to produce a confidence interval of a certain size, but specification with ten groups could be difficult. So, a rather subjective estimate of eight mice per strain was used. Had the outcome been the percentage of mice getting a tumour rather than tumour counts, then much larger sizes would have been required.
Eight animals of each strain, matched as closely as possible for age, were treated with urethane by i.p. injection and kept for about six months. They were then humanely killed and the number of tumours (which were only a couple of mm diameter) on the surface of the lungs were counted. The counts are presented in the table.
|Lung tumour counts in mice following treatment with urethane|
|Unpublished data, M.Festing dating from 1977|
A boxplot (MINITAB 14) provides a useful graphical summary of the results. The horizontal bar in the middle of the box is drawn at the median, the box covers the inter-quartile range, and the whisker is the lowest or highest value within 1.5 x the inter-quartile range. The asterisk indicates an outlier outside this range.
However, if it is necessary to decide which strains differ then an ANOVA and post-hoc comparisons will be necessary. In view of the heterogeneity of variation between strains a transformation of scale will be also be necessary. Counts of this sort often have a Poisson distribution, for which a square root transformation of X+1 is usually appropriate. Such an analysis (not shown) finds, for example, no significant difference between A, A2G and A2G-hr. Finally, a table showing means and standard deviations of the raw, untransformed data could be presented as shown below.Such a graphical summary may be sufficient, without further statistical analysis. It clearly shows the susceptible and resistant inbred strains, with two of the F1 hybrids being intermediate and one being resistant.
|Mean tumour counts in 10 strains of mice|