Question: Calculate p-value for experiment group with replicates, while control group without replicates
gravatar for zhangdengwei
8 weeks ago by
zhangdengwei40 wrote:


I am a novice for bioinformatics. I have a simple question on how to calculate the pvalue for my RNA-Seq data, which has condition group with replicates and control group without replicates, like the following,

Condition: A1, A2, A3
Control: B

Sample with replicate is must for DESeq2, and edgeR can support the sample without replicates. Which package should I employ? I am a bit confused.

Thanks for your help!

edger rna-seq deseq2 R • 160 views
ADD COMMENTlink modified 8 weeks ago by Nicolas Rosewick8.3k • written 8 weeks ago by zhangdengwei40
gravatar for Nicolas Rosewick
8 weeks ago by
Belgium, Brussels
Nicolas Rosewick8.3k wrote:

From DESeq2 vignette :

Can I use DESeq2 to analyze a dataset without replicates?

If a DESeqDataSet is provided with an experimental design without replicates, a warning is printed, that the samples are treated as replicates for estimation of dispersion. This kind of analysis is only useful for exploring the data, but will not provide the kind of proper statistical inference on differences between groups. Without biological replicates, it is not possible to estimate the biological variability of each gene. More details can be found in the manual page for ?DESeq.

Thus be carefull in the interpretation of your results ;)

ADD COMMENTlink written 8 weeks ago by Nicolas Rosewick8.3k
gravatar for ATpoint
8 weeks ago by
ATpoint25k wrote:

By best knowledge I think at least DESeq2 will use the dispersion estimate for the replicated group on the unreplicated group. Therefore, given you assume that the dispersion in this group is representative for the second group you might simply try running it. If for example A is a cancer sample and B is normal, then the dispersion in A is probably much larger than in B. In that case you would overestimate dispersion for B, so get fewer differential genes than there actually are. This might be ok as it at least avoids false positives (depends on your scientific question). Vice versa, if B was cancer you would probably strongly underestimate dispersion and get a lot of false-positives. What are these samples? Decide for yourself if the above assumption holds true for your data.

Edit: See here the statements of the DESeq2 and edgeR people on that matter:

ADD COMMENTlink modified 8 weeks ago • written 8 weeks ago by ATpoint25k

Thanks for your detailed explanation and I learn a lot. In my study, A is the patient with trisomy 21 and B is normal. I am confused about why the dispersion in A is probably much larger than in B if A is a cancer sample, may you explain it more? Thanks very much.

ADD REPLYlink written 8 weeks ago by zhangdengwei40
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1926 users visited in the last hour