Factorial designs

The purpose of factorial designs is to obtain more information from the same number of animals. Briefly, factorial experiments can be used:

To find out whether specified factors influence response to a given treatment
To see whether two factors interact or potentiate each other
To optimise response in screening experiments
To reduce the chance of missing an effect because the chosen material is insensitive.

Two examples are given here. The first is a 2×2 factorial showing what is meant by an interaction, and the second is a 4×2 factorial done using a randomised block design with two blocks.

Numerical example 1. A 2×2 factorial design

The aim of the study was to determine the effect of chloramphenicol on haematology of mice, and also whether strains differed in their response. Table 1 shows WBC counts in mice of two strains kept as controls or treated with chloramphenicol. These data were extracted, purely for purposes of illustration, from a larger study involving five strains and six dose levels. (Festing, et al 2001. Food Chem.Toxicol. 39:375-383.). Mice of each strain (of comparable age and from the same breeder) were assigned at random to each treatment group, dosed by gavage and after an appropriate time the mice were bled and the white blood cell counts were obtained.

Numerical example 1. A 2×2 factorial design

The aim of the study was to determine the effect of chloramphenicol on haematology of mice, and also whether strains differed in their response. Table 1 shows WBC counts in mice of two strains kept as controls or treated with chloramphenicol. These data were extracted, purely for purposes of illustration, from a larger study involving five strains and six dose levels. (Festing, et al 2001. Food Chem.Toxicol. 39:375-383.). Mice of each strain (of comparable age and from the same breeder) were assigned at random to each treatment group, dosed by gavage and after an appropriate time the mice were bled and the white blood cell counts were obtained.

Table 1. White blood cell counts in controls and mice given chloramphenicol succinate at 2500 mg/kg
Treatment	Strain
Treatment	CD-1	CBA
C	3.0	1.9
C	1.7	2.6
C	1.5	1.4
C	2.0	1.6
C	*	1.1
T	1.9	0.4
T	1.9	0.2
T	3.5	0.1
T	1.2	0.4
T	2.3	0.3
* indicates missing value

The missing value presents a problem. Factorial designs are most easily analysed if there are equal numbers in each group, which is not the case here. So in this case it is necessary to use a “general linear model analysis of variance” (GLM-ANOVA), which takes account of unequal numbers in each group.

A trial GLM-ANOVA was done with the residuals plots (MINITAB 14) shown below. Clearly the variation is not the same in each group (see Residuals versus Fitted Values where the variation is less in the left-hand four points than in the right hand points). Accordingly it was decided to transform the scale, using a logarathmic transformation. However, one was added to each value before taking logs in order to avoid negative numbers (some numbers are less than one so the logarithm would have been negative).

The GLM-ANOVA output from on the logarithmically transformed data MINITAB is given below. The table first shows the factors (strain, treatment), their type (both fixed), their levels (2 in each case) and their values. Then the ANOVA table. This differs from previous tables in having a row called “strain*treatment” which indicates whether the strains are responding in the same way. As this is highly significant (p=0.001), this indicates that the strains do not respond in the same way (at the p=0.001 level of probability). The table also has two columns “Seq SS” and “Adj SS”. These are because of the unequal numbers in each group. Below the table is an estimate of the pooled standard deviation S, and R-sq which indivates the proportion of the total variation accounted for by the two factors.

General Linear Model: LogWBC versus strain, Treatment

Factor Type Levels Values

strain fixed 2 CBA, CD-1
Treatment fixed 2 C, T

Analysis of Variance for LogWBC, using Adjusted SS for Tests

Source           DF      Seq SS      Adj SS   Adj MS    F        P
strain           1     0.22256    0.21990 0.21990   28.89    0.000
Treatment        1     0.12915    0.11393 0.11393   14.97    0.002
strain*Treatment 1     0.13009    0.13009 0.13009   17.09    0.001
Error            15     0.11416    0.11416 0.00761
Total            18     0.59597

S = 0.0872401 R-Sq = 80.84%

As the strain*treatment p-value is 0.001, the null hypothesis that the strains are responding in the same way should be rejected at p<0.01, and we conclude that the strains differ in response.

An interaction plot is given below which shows that the logWBC counts do not differ much in control mice of the two strains, but that following treatment with chloramphenicol they decline markedly in strain CBA but not in CD-1.

Least Squares Means for LogWBC

strain     Mean   SE of Mean
CBA      0.2663   0.02759
CD-1     0.4825   0.02926

Treatment

C 0.4522 0.02926
T 0.2966 0.02759

strain*Treatment

CBA C   0.4272   0.03901
CBA T   0.1054   0.03901
CD-1 C   0.4771   0.04362
CD-1 T   0.4878   0.03901

The “least square means”, which take account of the missing value, and therefore the fact that the design is unbalanced, are given above with their standard errors. Look at the strain*treatment means. Whereas in strain CBA the treated groups has a substantially lower WBC count (still on the log scale), the same is not true in strain CD-1. When presenting these data in a paper, the means should be anti-logged and the one subtracted to return to the original scale, although back transforming the standard deviation is not approriate.

Numerical example 2. A factorial-randomised block design

Randomised block designs are used to break up the experiment into a number of “mini-experiments”. This can be done for convenience, or to make efficient use of heterogeneous material, or because the material has some natural structure, or to ensure that the experiment is repeatable over time. The following (real) experiment was done to find out whether BHA (an anti-oxidant) induced activity of a liver enzyme “EROD”, and the extent to which this was strain dependent (Festing, Trends Pharmacol.Sci.24, 341-345, 2003). It was part of a larger study to find out whether anti-oxidants help to protect against cancer.

The experiment was done as a 4(strains)x2(treatments)x2(blocks) factorial design. The two randomised blocks were separated by about three months. Blocking was used both for convenience, as only eight mice needed to be handled at a time, and also to ensure that the results were repeatable. Blocking in time is common in in-vitro studies, but is much less common in in-vivo experiments, although it has much to recommend it.

The actual conduct of the experiment involved, for each block, obtaining two mice of each strain matched for age, free of disease, etc. These were assigned at random to either treated or control groups. The BHA was incorporated in the diet (see reference for details of dose levels, husbandry and biochemical methods). After three weeks the mice were humanely killed and the activity of the liver enzyme was determined.

Activity of EROD liver enzyme in control and BHA-treated mice
	Block 1		Block 2
Strain	Treated	Control	Treated	Control
A/J	18.7	7.7	16.7	6.4
129/Ola	17.9	8.4	14.4	6.7
NIH	19.2	9.8	12.0	8.1
BALB/c	26.3	9.7	19.8	6.0

The results are shown in the above table. The values in the treated animals are in all cases higher than than for the controls. However, all the values in block 2 were also lower than those in block 1. This is not an uncommon finding and is probably due to differences in reagents, calibration of instruments and environmental influences. It shows the importance of doing the experiments correctly. If all the controls had been done first, and then the treated animals three months later, statistically significant differences would have been found even if there was not true treatment effect.

The statistical analysis now involves a three-way ANOVA with the effects of strain and treatment being “fixed effects” and the block being a “random effect”. The ANOVA done using MINITAB is shown below. Note that MINITAB uses a “general linear model ANOVA”. This is to allow for possible unequal numbers in each group. Now it has two columns for the Sums of Squares labelled SeqSS and AdjSS. If there were unequal numbers in each group, these would be different. As there were no differences, they are the same.

Below the heading, MINITAB lists the factors, their type (random or fixed), their levels (2 for blocks and treatments, four for strains, and their values (coded in this case). Below is the ANOVA proper. There were large effects of block (p=0.004) and Treatment (p<0.0005) and the Treatment*strain interactions was significant at p=0.03. This implies that the strains differed slightly in response.

Residuals plots were studied (not shown), and there was no evidence for lack of normality of the residuals or heterogeneity of variance.

MINITAB output from the ANOVA

Factor Type Levels Values
Block random 2 1 2
Treatmen fixed 2 1 2
Strain fixed 4 1 2 3 4

Analysis of Variance for EROD, using Adjusted SS for Tests

Source DF SeqSS AdjSS Adj MS F P
Block 1 47.610 47.610 47.610 18.37 0.004
Treatmen 1 422.302 422.302 422.302 162.96 0.000
Strain 3 32.962 32.962 10.987 4.24 0.053
Treat.*Strain 3 40.343 40.343 13.448 5.19 0.034
Error 7 18.140 18.140 2.591
Total 15 561.357

The Treatment*strain interaction plot produced by MINITAB is shown below. Treatment 1 is the treated group, and all scores were much higher than in the controls. However, the difference was clearly a bit larger for BALB/c (green). Post-hoc comparisons (not shown) indicate that this is indeed largely due to BALB/c being slightly more sensitive than the other strains. Whether this has important biological implications is open to discussion. It may well be an example of a statistically significant effect which is of little biological importance.

A plot of the strain and treatment means is shown below. Controls are in yellow, treated in blue. each mean involves two animals. Some people have expressed doubts about the validity of means based on such small numbers. But in fact the sample size for comparing treated versus control is eight per group (averaging across strains). Means of two are better than if each individual were genetically different, as would be the case had the experiment been done using outbred mice.

Other designs

The designs considered here are by no means exhaustive. However, most of them should not be attempted without professional advice, and discussion is beyond the scope of this web site.