Question: DESeq2 for differential expression analysis of RNA-seq datawith few samples per condition?
0
gravatar for amyfm
5.0 years ago by
amyfm10
Ireland
amyfm10 wrote:

Hi, I would like to do differential expression analysis of RNA-seq data between 2 conditions, and I only have 2-3 samples per condition. Do you recommend me to use DESeq2 method for this case, or which one? Would results be reliable? I had more samples but I had to removed them from my analysis because of poor quality. 

rna-seq deseq2 samples replicates • 2.5k views
ADD COMMENTlink modified 2.1 years ago by Biostar ♦♦ 20 • written 5.0 years ago by amyfm10

I would say that's a very common situation for RNA-seq. You can have a look at this paper to have an idea about the accuracy of the methods. (DESeq2 was not developed yet by then, I guess that it would be a bit better than DESeq)

ADD REPLYlink written 5.0 years ago by Martombo2.6k

thank you, according to that paper DESeq (and I imagine DESeq 2) have better performance than other methods when using few replicates

ADD REPLYlink written 5.0 years ago by amyfm10

My samples are ovarian tissue in different 2 conditions (high and low fertility), and as I said 2-3 replicates per condition

When I try the differential expression analysis between only 2 samples (1vs1), I obtain a smaller list of genes than when I do it between 5 samples (2 vs 3), which makes me thing that in this case using few replicates is not going to give me many false positives, because if not the 1vs1 analysis would give me a large list of unreliable genes, and it doesn't

ADD REPLYlink written 5.0 years ago by amyfm10

Basically the more biological replicates the better. The more samples you have, the better you can estimate variance and thus reliably report statistically robust observations. If you used DESeq2 on "1 v 1" samples, then the software will act very conservatively, which seems to suggest that if you're getting statistically significant results, that the biological effect is quite strong. Adding more replicates means that you will likely get more statistically significant results.

ADD REPLYlink modified 11 months ago by RamRS30k • written 5.0 years ago by andrew.j.skelton736.0k

The number of false positives should be the same no matter how many reps you use. What changes is the number of true positives and the % of calls that are good. The number of false positives at a given p value should be the p value (i.e. 1% of the genes are falsely called at p<0.01). Your false discovery rate (the % of all your calls which are false positives) will decrease with a greater number of replicates because you will be discovering more true positives. So if you are calling 300 of 30,000 genes with no replicates at p<0.01 all of your calls are expected to be bad. If you call 600 genes with 2 reps at p<0.01 half of your calls are expected to be good! But the other half is still bad. Of course, you can't really calculate a valid p value with one rep because you are only guessing on what the variance is. Usually you want to choose a p-value restrictive enough so that your expected FDR is around 10% or lower but that is not always possible with low powered experiments.

ADD REPLYlink written 5.0 years ago by Michele Busby2.1k

Thank you very much for your help.

Doing my differential expression analysis between 2 and 3 different animal models, I obtain fewer DEGs than if I do it between 3 and 3 (in this last case I am adding one sample that has regular quality - the others have good quality). My aim is obtaining as many DEGs as possible. Which analysis do you think I should do - which would be more reliable? 2 vs 3 of good quality and more DEGs or 3 vs 3 with one of regular quality and fewer DEGs)?

ADD REPLYlink modified 11 months ago by RamRS30k • written 5.0 years ago by amyfm10
1
gravatar for andrew.j.skelton73
5.0 years ago by
London
andrew.j.skelton736.0k wrote:

Depends on what the samples are. You can still run them through DESeq2, you have biological replicates, it just may be underpowered depending on what you're looking for.

DESeq2 is good for gene level analysis. Salmon/Kallisto + Sleuth for transcripts.

ADD COMMENTlink modified 11 months ago by RamRS30k • written 5.0 years ago by andrew.j.skelton736.0k
0
gravatar for marina.kimyr
5.0 years ago by
marina.kimyr10
United States
marina.kimyr10 wrote:

I have published papers where we only had 3 replicates per condition in a control / condition setting. As long as you have more than 2 replicates, technically you can use DESeq2 for differential analysis and meanwhile you may discuss the lack of replicate in your paper if you feel it worth to mention anything about it.

Good luck

ADD COMMENTlink modified 11 months ago by RamRS30k • written 5.0 years ago by marina.kimyr10

Thank you very much! With more than 2 replicates, you mean 2 replicates included?

ADD REPLYlink written 5.0 years ago by amyfm10

I meant at least 2 replicates for the control and at least 2 replicates for the sample :)

ADD REPLYlink written 5.0 years ago by marina.kimyr10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1631 users visited in the last hour