Question

What Are Some Good Resources To Convince Biologists To Power Studies Appropriately?

10

Entering edit mode

12.2 years ago

hurfdurf ▴ 490

I've been given an opportunity to produce some guidelines for design of microarray experiments. I know there many resources and guides for determining approprate microarray experiment sample sizes.

However, what I'm looking for are data sets or examples that will actually convince non-math friendly crowds to consider sample size more seriously before committing resources to experiments. I would like to do a short presentation with horror stories of poor experimental design if possible. Microarray specific examples would be good, but other engaging examples from other fields with multiple comparison problems are welcome. I'd like to avoid formulas and statistical terminology as much as possible to avoid jargon narcosis, as per this question. These are primarily Affy arrays.

affymetrix microarray • 2.6k views

ADD COMMENT • link updated 4.0 years ago by Biostar 20 • written 12.2 years ago by hurfdurf ▴ 490

score 4 · Answer 1 · 2012-02-10

4

Entering edit mode

12.2 years ago

Mary 11k

Sometimes humor can be effective: Researcher convos with Biostatistician

ADD COMMENT • link 12.2 years ago by Mary 11k

1

Entering edit mode

I am not sure it will go down very well with the experimentalists. Let us know reactions in case you go ahead ;)

ADD REPLY • link 12.2 years ago by Stefano Berri 4.4k

0

Entering edit mode

Yes, it would have to be done gently--but I can imagine in a presentation they hurfdurf could show this, and make me "bad cop" saying "I asked some bioinfo geeks and they gave me this..." And then be "good cop": But of course what I want is to note the need to be involved in design to help you get the most out of this... I bet they'd remember it. :)

ADD REPLY • link 12.2 years ago by Mary 11k

Ram · Answer 2 · 2012-02-10

Hi,

do you have any ideas of poor experimental designs you want to discuss? That might be easier to find some example.

One thing I have in mind (and that is one of the major factor to consider) is biological replicates. I think you can easily illustrate this. For example, if you have a microarray dataset with biological replicates you can easily use, you can try to identify some genes whose levels of expression vary a lot between individuals. You should be able to find some genes (or set of genes) that seem up-regulated in condition A if you compare two specific individuals but that appear to be down-regulated if you compare two different individuals. This kind of plot can be easily understood by biologists and there might be no need to develop with statistical method. An extension would be to show how an important number of biological replicates can eliminate this variability (such datasets exist for mouse but I could not find it back).

One other aspect you might want to discuss are batch-, day-of-experiment- or technician- effect. Sometimes, for various reasons, profiles of expression or tissue/condition transcriptomes cluster by one of those cited factors. That's something you might anyway check when analyzing your data but an upstream careful design of the experiment can already help. I am sure you can find some examples where different transcriptome experiments cluster according to the day of experiment or the technician who performed it. This is for example illustrated by this BioStar question from yesterday. Showing such a bias (a heatmap with clustering should be enough) might draw the attention of your audience to this problem. The take-home message would then be to randomize your experiments and not to process all your condition A sample the same day and your condition B sample a different day with a different technician. this would imply you would not be able to distinguish technical variability from actual biological differences. A similar problem is the control of the time of day of tissue/cells sampling, diet of animal,... This seems quite obvious but is worth mentioning since these factors can explain some environmentally induced variability. There is quite some literature about this (notably for circadian rhythms or diet effect), that might be enough as references to provide.

I would have one last word for technical replicates. Even if it seems obvious to include some in your experiments it is not always (often?) done. I never faced such an example myself but I am sure you might be able to find some example of poorly correlated technical replicates. Contrasting the correlation (a simple scatter plot might be enough) of two highly correlated (rho > 0.95) and poorly correlated technical replicates should also be of interest for your audience. Actually that might even be mentioned before the biological replicate since this is the most informative experiment you can perform to assess reproducibility of your protocol (even though that should not be a problem with standard platform such as Affymetrix).

I guess these few examples can be illustrated with simple plots or notions without requiring complex statistical explanation. I am sorry I could not provide more concrete material (such as plots) but it might be easy to find on the internet or produce if you have some extensive microarray datasets you can use.

score 3 · Answer 3 · 2012-02-10

Toward the end of my PhD on plant molecular biology, I red this review "A discussion of statistical methods for design and analysis of microarray experiments for plant scientists.". I found it very clear and nice and I wished it was written when I started deling with microarrays rather then when I had my thesys nearly ready. It is for "plant scientists" but I think you can substitute plant with mouse and it will still hold. And, anyway, the examples don't matter that much.