Differential gene expression analysis: how different could be the number of samples for the two conditions being compared?
0
0
Entering edit mode
5.7 years ago
mmitra ▴ 60

Hi all, I have a basic question regarding the differential gene expression analysis (DESeq2) between the two conditions (say 1 and 2). If I have 3 samples for condition 1 and 60 samples for condition 2, would it be fine to do differential gene expression analysis between the conditions 1 and 2? Or, do I need to randomly select fewer samples from condition 2 to have a more "balanced" analysis? Are there any statistical problems associated with this? If I need to select fewer samples, then how many samples of condition 2 could be selected for the analysis?

Thanks in advance for any suggestions. I really appreciate your help.

rna-seq deseq2 number of samples • 2.5k views
ADD COMMENT
1
Entering edit mode

I see this issue has been raised on bioconductor (e.g. here). Not a statistician and interested to hear other views, but I'd say the DE methods in DESeq2 are valid for unbalanced groups; but they may be less optimal than if you had a balanced design with the same total sample size. You have a very large imbalance so I'd guess your variance estimates might be driven by the variances in the n=60 (larger) group. Having said that, DESeq2 is sharing variance information across genes. I think you could certainly proceed with all samples, and not down-sample to equalize group size. But I would want to visualize your data carefully using MA-plots etc. to confirm you are not seeing any group-size driven artifacts among genes found to be DE.

ADD REPLY
0
Entering edit mode

Thanks so much for your suggestions. They are very helpful.

ADD REPLY

Login before adding your answer.

Traffic: 1428 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6