BuiltWithNOF
Power analysis

The  Power Analysis method of estimating sample size depends on a mathematical relationship between the following six variables.

  • Variability of the material
    • An estimate of the standard deviation of the experimental subjects is necessary (for quantitative variables). This must  come from a previous study, a pilot experiment or from the literature. This is the main weakness of the method because the estimate of sample size depends critically on this estimate.

  • Effect size of clinical or biological importance
    • Consider an experiment with just a control and a treated group. A small difference in the means may be of little scientific  or clinical interest. However, an investigator would be very interested in being able to detect a large difference. Thus, the investigator needs to  be able to specify the minimum effect size likely to be of interest.
      For quantitative characters it is often helpful to consider the effect size in terms of standard deviation units by dividing it by the standard deviation (SDev.). In this way all traits are in the same units,  and it becomes easier to judge the consequences of choosing various effect  sizes. This is described in more detail below. To detect an effect size  larger smaller than one SDev. will require a "large" experiment. To detect  one greater than two SDevs. will require a "small" experiment.

  • Significance level
    • This is usually set at 0.05, but in some circumstances it may be more appropriate to use a different figure. For example, power  will be higher if the significance level is set at 0.1 rather than o.05, so if the aim is to prove a negative (i.e. that the treatment is having no effect), then the significance level may be set at 0.1.

  • Power
    • The power is the probability of being able to detect the specified effect and call it significant at the designated level of  significance. Most people will want a powerful experiment. Usually this is set somewhere between 80% and 95%. The higher the specified power, the  larger the sample size that will be needed, other things being equal. High power is needed if the consequences of failing to detect a treatment effect are likely to be serious.

  • Sidedness of the test
    • In most circumstance it will not be known whether the  treatment will increase or decrease the mean of the character of interest,  so a two-sided test should be used. In some circumstances there will be a  good biological reason why the effect of the treatment can only go in one direction. In this case a one-sided test should be used.

  • Sample size
    • The purpose of the power analysis is usually to  determine sample size. However, where resources are limited sample size  may be fixed and the aim of the analysis might then be to determine the  power of the experiment or the effect size likely to be detected.

 Putting it together


The mathematical equations relating these variables are  complex. Many modern statistical packages such as MINITAB now offer power analysis calculations. There are number of free web sites such as which will do the calculations for simpler situations. Click here.
 
There are also several stand-alone statistical packages such as nQuery Advisor which offer power analysis for a wide range of situations, although  these are not inexpensive.

The graph in Fig. 1 shows the sample size as a function  of the effect size in standard deviation units for a 90% power, a 5%  significance level and a two sided test. Thus, using this, an effect size (difference between mean of treated and control group) equal to one  standard deviation will about 23 animals per group.

The two sample case

The graph in Fig. 1 shows the sample size as a function  of the effect size in standard deviation units for a 90% power, a 5%  significance level and a two sided test. Thus, using this, an effect size (difference between mean of treated and control group) equal to one standard deviation will require about 23 animals per group.

Sample1

Fig. 1. Sample size as a function of effect size in  standard deviations assuming a 90% power, a 5% significance level and a   two-sided test.

Note that this graph may also be used to determine effect size if sample size is fixed.

Large groups sizes are required to detect small effects  such as those of less than half a standard deviation. However, anyone  using laboratory mice or rats  has enormous control over variability.  Isogenic strains raised in a controlled environment, free of disease, fed  a uniform diet and matched for age and body weight are very uniform so the standard deviation is much smaller than that found, for example, in humans  studies. This means that in terms of standard deviations, most research workers are only interested in studying "large" effects of over one standard deviation in magnitude. Effects of two or more standard deviations can be detected withgroups of only about eight animals.

Example using the graph.

An investigator wishes to compare two anaesthetics for dogs, and in particular if there was a  differences in blood pressure  while under anaesthetic of 10mmHg or more she would like to know about it. She plans to do the experiment using beagles, and previous studies show  that their mean blood pressure under an anaesthetic is 108mmHg, with a  standard deviation (SD) of 9mmHg. The effect size is therefore 10/9 =  1.1 SDs, and the data will be analysed using a two-sample t-test. Reading from the graph, this will require about 20 dogs per group. The same calculations can be done using www.biomath.info.

Suppose only 30 dogs are available, from the graph it  is is possible to estimate that with 15 animals per group the effect size  that is likely to be detectable (with the assumptions given) is about 1.3  SDs or 1.3*9= 12 mmHg.

(Note that rather than using a between-animal design it  would probably be better to test both anaesthetics on each dog in random order using a within-animal design. Estimation of sample size in this case  would require an estimate of the standard deviation of blood pressure of  dogs repeatedly anaesthatised with the same anaesthetic. The resulting data would be analysed using a paired t-test. Power calculations for the paired t-test are provided in www.biomath.info)

Table 1 shows the sample size needed when comparing two  proportions, assuming a 5% significance level and a 90% power. Thus, to distinguish between a 20% incidence and a 40% incidence of some binary trait will require 109 animals in each group. These are very large sample sizes. Clearly, it is very much better to measure something than to count!

The two sample case with binary outcomes (e.g. percentages)

Power calculations showing the number  required in each group for comparing two proportions (based on a  normal approximation of the binomial distribution) with a significance  level of 0.05 and a power of 90%

Proportion in each group

0.2

0.3

0.4

0.5

0.6

0.7

0.2

-

0.3

392

-

0.4

109

477

-

0.5

52

124

519

-

0.6

90

56

130

519

-

0.7

19a

31

56

124

477

-

0.8

13a

19 a

30

52

109

392

aAssumptions  may lead to some inaccuracy.

More complex situations

With more than two groups it is more difficult to specify the effect size of scientific interest, and  the more complex situations are not generally catered for by the free web sites. The problem is tackled in different ways by different computer packages. MINITAB, for example, asks you to specify the difference between the two most extreme means, while nQuery Advisor gets you to specify group means  and then calculates their standard deviation.

The ILAR web sitehttp://dels.nas.eduprovides extensive information on all aspectes of  laboratory animal science. Full text of an excellent article on power  analysis is given in

http://dels.nas.edu/ilar_n/ilarjournal/43_4/v4304Dell.shtml

A book on the design of animal experiments aimed ar biomedical research workers can be found at

www.lal.org.uk/hbook14.htm