Question: Differential gene expression analysis: how different could be the number of samples for the two conditions being compared?
gravatar for mmitra
10 weeks ago by
Los Angeles, United States
mmitra30 wrote:

Hi all, I have a basic question regarding the differential gene expression analysis (DESeq2) between the two conditions (say 1 and 2). If I have 3 samples for condition 1 and 60 samples for condition 2, would it be fine to do differential gene expression analysis between the conditions 1 and 2? Or, do I need to randomly select fewer samples from condition 2 to have a more "balanced" analysis? Are there any statistical problems associated with this? If I need to select fewer samples, then how many samples of condition 2 could be selected for the analysis?

Thanks in advance for any suggestions. I really appreciate your help.

ADD COMMENTlink written 10 weeks ago by mmitra30

I see this issue has been raised on bioconductor (e.g. here). Not a statistician and interested to hear other views, but I'd say the DE methods in DESeq2 are valid for unbalanced groups; but they may be less optimal than if you had a balanced design with the same total sample size. You have a very large imbalance so I'd guess your variance estimates might be driven by the variances in the n=60 (larger) group. Having said that, DESeq2 is sharing variance information across genes. I think you could certainly proceed with all samples, and not down-sample to equalize group size. But I would want to visualize your data carefully using MA-plots etc. to confirm you are not seeing any group-size driven artifacts among genes found to be DE.

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by Ahill1.5k

Thanks so much for your suggestions. They are very helpful.

ADD REPLYlink written 10 weeks ago by mmitra30
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1772 users visited in the last hour