Question: calculate sample size for RNAseq experiment
3.3 years ago by
roser.navarro

Dear all,

I've read some papers and vignettes related to calculate the sample size for RNAseq experiments.

Iv'e also tried to use some tools as RNAseqPS, rnaseqpower, sspa but all of them work with 2 groups.

I have a problem because in our experimental design we are comparing more than 2 groups.

We take samples from patients in 5 different time points.

From each patient we take 1 biopsy and 1 liquid sample. From the biopsy we take 3 different samples (different parts of the biopsy which should have different transcriptomic profiles). So, in summary: from each patient we have 4 samples (3 from the biopsy and 1 from the liquid).

We want to compare samples within each time point and also samples between different time points (see the scheme).

T1    T2    T3    T4    T5 (time point)

S1    S1    S1    S1    S1  (samples type 1)

S2    S2    S2    S2    S2

S3    S3    S3    S3    S3

S4    S4    S4    S4    S4

N?    N?    N?    N?    N? (sample size for each group)

Differential expression analysis will be performed in horizontal and in vertical (between time points and between type of samples).

I've seen that above approaches compare only 2 groups (A vs B).

How can we deal with this problem?

Could we calculate sample size for 2 groups and multiply the N by 5? Or should we increase the sample size?

Because we are not comparing A vs B. We compare A vs B, A vs C, A vs D, A vs E, B vs C.... etc. So I guess this is a more complex problem (FDR) that I don't know how to solve.

Best regard and thanks in advance

3.3 years ago
3.3 years ago by
Carlo Yague4.4k
Belgium
Carlo Yague

To take into account both the different sample types/patients and the time course, I suggest you try DESeq2 for your differential expression analysis. More info in the documentation and here, specifically about time course analysis : http://www.bioconductor.org/help/workflows/rnaseqGene/#time

With DESeq2, you have to build explicit models where every parameter is specified like this :

```sample    time    tissue    patient
1         0       A         I
2         0       B         I
3         0       A         II```

In that kind of model, the number of samples in each condition or combination of conditions is explicit and you don't need to have a "sample size" parameter.

PS : I assume that in your question "sample size" reflects the number of patients in each group, not the sizeFactor or coverage of your RNA-seq data.

Dear Carlo,

But my problem is not related with the differential expression analysis.

What I need to know is the number of patients per group that I have to include in my study  knowing that:

I want to get a statistical power value of 0.9 and a significance level (alpha) of 0.05, using a 5M of reads coverage.

I'm not sure, but I think that the sample size per group you need is in fact lower or egal than the sample size required for a pairwize comparison. Why ? Because the different types of tissues/time points from the same patient can be considered as partial replicates.

Another way to look at it is that with a number of patients of X/group, you will have 2X samples in a pairwize comparison, but 20X samples for the whole study (4 tissues, 5 time points/patients). Having so many RNA-seq samples will allow you to have better estimates of the mean and dispersion of the expression level, leading to a more powerful analysis.

In any case, depending on your budget, the more samples, the better ! I Hope this help, and sorry for the misunderstanding before.