Question

calculate sample size for RNAseq experiment

0

Entering edit mode

8.2 years ago

roser.navarro • 0

Dear all,

I've read some papers and vignettes related to calculate the sample size for RNAseq experiments.

I've also tried to use some tools as RNAseqPS, rnaseqpower, sspa but all of them work with 2 groups.

I have a problem because in our experimental design we are comparing more than 2 groups.

We take samples from patients in 5 different time points.

From each patient we take 1 biopsy and 1 liquid sample. From the biopsy we take 3 different samples (different parts of the biopsy which should have different transcriptomic profiles). So, in summary: from each patient we have 4 samples (3 from the biopsy and 1 from the liquid).

We want to compare samples within each time point and also samples between different time points (see the scheme).

T1    T2    T3    T4    T5 (time point)
S1    S1    S1    S1    S1  (samples type 1)
S2    S2    S2    S2    S2
S3    S3    S3    S3    S3
S4    S4    S4    S4    S4
N?    N?    N?    N?    N? (sample size for each group)

Differential expression analysis will be performed in horizontal and in vertical (between time points and between type of samples).

I've seen that above approaches compare only 2 groups (A vs B).

How can we deal with this problem?

Could we calculate sample size for 2 groups and multiply the N by 5? Or should we increase the sample size?

Because we are not comparing A vs B. We compare A vs B, A vs C, A vs D, A vs E, B vs C.... etc. So I guess this is a more complex problem (FDR) that I don't know how to solve.

Any help/advice will be welcome.

Best regard and thanks in advance

RNA-Seq power complex-design sample-size • 2.5k views

ADD COMMENT • link updated 21 months ago by Ram 43k • written 8.2 years ago by roser.navarro • 0

Ram · Answer 1 · 2016-02-16

0

Entering edit mode

8.2 years ago

Carlo Yague 8.7k

To take into account both the different sample types/patients and the time course, I suggest you try DESeq2 for your differential expression analysis. More info in the documentation and here, specifically about time course analysis: http://www.bioconductor.org/help/workflows/rnaseqGene/#time

With DESeq2, you have to build explicit models where every parameter is specified like this:

sample    time    tissue    patient
1         0       A         I
2         0       B         I
3         0       A         II

In that kind of model, the number of samples in each condition or combination of conditions is explicit and you don't need to have a "sample size" parameter.

PS : I assume that in your question "sample size" reflects the number of patients in each group, not the sizeFactor or coverage of your RNA-seq data.

ADD COMMENT • link updated 21 months ago by Ram 43k • written 8.2 years ago by Carlo Yague 8.7k

0

Entering edit mode

Dear Carlo,

Thanks for your answer :-)

But my problem is not related with the differential expression analysis.

What I need to know is the number of patients per group that I have to include in my study knowing that:

I want to get a statistical power value of 0.9 and a significance level (alpha) of 0.05, using a 5M of reads coverage.

ADD REPLY • link updated 21 months ago by Ram 43k • written 8.2 years ago by roser.navarro • 0

0

Entering edit mode

Oh, sorry I got it wrong... I thought you already had your data. As far as I know, there is no easy answer to your question.

I'm not sure, but I think that the sample size per group you need is in fact lower or egal than the sample size required for a pairwise comparison. Why? Because the different types of tissues/time points from the same patient can be considered as partial replicates.

Another way to look at it is that with a number of patients of X/group, you will have 2X samples in a pairwise comparison, but 20X samples for the whole study (4 tissues, 5 time points/patients). Having so many RNA-seq samples will allow you to have better estimates of the mean and dispersion of the expression level, leading to a more powerful analysis.

In any case, depending on your budget, the more samples, the better ! I Hope this help, and sorry for the misunderstanding before.

ADD REPLY • link updated 21 months ago by Ram 43k • written 8.2 years ago by Carlo Yague 8.7k