Question

Calculate p-value for experiment group with replicates, while control group without replicates

1

Entering edit mode

4.6 years ago

zhangdengwei ▴ 210

Hi,

I am a novice for bioinformatics. I have a simple question on how to calculate the pvalue for my RNA-Seq data, which has condition group with replicates and control group without replicates, like the following,

Condition: A1, A2, A3
Control: B

Sample with replicate is must for DESeq2, and edgeR can support the sample without replicates. Which package should I employ? I am a bit confused.

Thanks for your help!

R RNA-Seq DESeq2 edgeR • 1.5k views

ADD COMMENT • link updated 4.6 years ago by Nicolas Rosewick 10k • written 4.6 years ago by zhangdengwei ▴ 210

score 2 · Answer 1 · 2019-09-17

From DESeq2 vignette : http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#can-i-use-deseq2-to-analyze-a-dataset-without-replicates

Can I use DESeq2 to analyze a dataset without replicates?

If a DESeqDataSet is provided with an experimental design without replicates, a warning is printed, that the samples are treated as replicates for estimation of dispersion. This kind of analysis is only useful for exploring the data, but will not provide the kind of proper statistical inference on differences between groups. Without biological replicates, it is not possible to estimate the biological variability of each gene. More details can be found in the manual page for ?DESeq.

Thus be carefull in the interpretation of your results ;)

score 1 · Answer 2 · 2019-09-17

By best knowledge I think at least DESeq2 will use the dispersion estimate for the replicated group on the unreplicated group. Therefore, given you assume that the dispersion in this group is representative for the second group you might simply try running it. If for example A is a cancer sample and B is normal, then the dispersion in A is probably much larger than in B. In that case you would overestimate dispersion for B, so get fewer differential genes than there actually are. This might be ok as it at least avoids false positives (depends on your scientific question). Vice versa, if B was cancer you would probably strongly underestimate dispersion and get a lot of false-positives. What are these samples? Decide for yourself if the above assumption holds true for your data.

Edit: See here the statements of the DESeq2 and edgeR people on that matter: https://support.bioconductor.org/p/63585/