Question

Estimate The Type 2 Error In A Microarray Study

4

Entering edit mode

13.5 years ago

Julien Textoris ▴ 430

Hi all,

to answer reviewing of a clinical paper in which we analyzed the whole blood transcriptome in patients with pulmonary infection, and for which the results are negative, the reviewer asked me to estimate the type II error, or power of the anaysis, which is a true question, givent the negative results. However, i don't really know how to compute/estimate this. The FDR and correction for multi-testing are an estimate of type I error, but for type II, i don't really know how to do this with multitesting (the analysis was a supervised analysis with SAM algorithm in two groups of patients (one microarray for each patient)).

Hope someone could give me some clue ?

Thanks in advance

Julien

microarray transcriptome error • 2.8k views

ADD COMMENT • link updated 13.5 years ago by Michael 54k • written 13.5 years ago by Julien Textoris ▴ 430

Michael · Answer 1 · 2010-11-29

5

Entering edit mode

13.5 years ago

User 59 13k

There are packages to calculate power in R/BioConductor. sizepower springs to mind:

"This package has been prepared to assist users in computing either a sample size or power value for a microarray experimental study. The user is referred to the cited references for technical background on the methodology underpinning these calculations. This package provides support for five types of sample size and power calculations. These five types can be adapted in various ways to encompass many of the standard designs encountered in practice."

You should, of course, do this prior to your experiment to know how many samples to use for a desired power, rather than using it as a post-hoc assessment of the work you've done.

ADD COMMENT • link updated 13.5 years ago by Michael 54k • written 13.5 years ago by User 59 13k

2

Entering edit mode

+1 for doing power analysis BEFORE the study ... but I find that this is never the case.

ADD REPLY • link 13.5 years ago by Will 4.5k

1

Entering edit mode

The sizepower package looks more versatile than SPSA, as it works with many experiment designs whiel SPSA only works with two-sample comparisons.

ADD REPLY • link 13.5 years ago by Michael 54k

0

Entering edit mode

I think Michael fixed my link - thanks :)

ADD REPLY • link 13.5 years ago by User 59 13k

0

Entering edit mode

Depends which community I'm working with, the medics are used to doing it for studies involving patients, so expect to do it for their array studies as well (and referees in medical journals seem more inclined to ask for them if not stated). But it's not widespread amongst the biologists I work with.

ADD REPLY • link 13.5 years ago by User 59 13k

Ram · Answer 2 · 2010-11-29

Your reviewer seemed to be picky, because the power is really not trivial to estimate

To estimate the power of a test, e.g. the t-test, you need to know:

the type of the test (one/two sided, one/two sample,...)
sample size
the true difference in means between groups (for a really differentially expressed gene)
the standard deviation (this is for one gene)

The true difference in means and standard deviation are normally not that easily available.

A simplistic (frequentist) way would be just to get the maximum likelihood estimate of the standard deviation from your data and average over it, and then setting the real difference in means to an arbitrary value (e.g. 2). Than you can claim with these parameters a t-test would yield this power if the data is normal.

In R you can use the function power.t.test for this simple calculation:

> example(power.t.test)   
pwr.t.>  power.t.test(n = 20, delta = 1)

     Two-sample t test power calculation

              n = 20
          delta = 1
             sd = 1
      sig.level = 0.05
          power = 0.8689528
    alternative = two.sided

 NOTE: n is number in *each* group 
             strict = FALSE)

That way you will probably vastly mis-estimate the variance and thus be overly optimistic about the power of your test, so the power.t.test function is more for illustration and understanding the concept.

More sophisticated methods I found:

The SPSA package in bioconductor.

You could also have a look at Black and Doerge (2002) and especially Page et al. (2006) who have implemented the PowerAtlas software for power analysis based on publicly available data.

Black MA, Doerge RW (2002) Calculation of the minimum number of replicate spots required for detection of significant gene expression fold change in microarray experiments. Bioinformatics 18(12):1609–1616

Page GP, Edwards JW, Gadbury GL et al (2006) The PowerAtlas: a power and sample size atlas for microarray experimental design and research. BMC Bioinformatics 7:84