Calculate p-value for experiment group with replicates, while control group without replicates
2
1
Entering edit mode
4.6 years ago
zhangdengwei ▴ 210

Hi,

I am a novice for bioinformatics. I have a simple question on how to calculate the pvalue for my RNA-Seq data, which has condition group with replicates and control group without replicates, like the following,

Condition: A1, A2, A3
Control: B

Sample with replicate is must for DESeq2, and edgeR can support the sample without replicates. Which package should I employ? I am a bit confused.

Thanks for your help!

R RNA-Seq DESeq2 edgeR • 1.5k views
ADD COMMENT
2
Entering edit mode
4.6 years ago

From DESeq2 vignette : http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#can-i-use-deseq2-to-analyze-a-dataset-without-replicates

Can I use DESeq2 to analyze a dataset without replicates?

If a DESeqDataSet is provided with an experimental design without replicates, a warning is printed, that the samples are treated as replicates for estimation of dispersion. This kind of analysis is only useful for exploring the data, but will not provide the kind of proper statistical inference on differences between groups. Without biological replicates, it is not possible to estimate the biological variability of each gene. More details can be found in the manual page for ?DESeq.

Thus be carefull in the interpretation of your results ;)

ADD COMMENT
1
Entering edit mode
4.6 years ago
ATpoint 82k

By best knowledge I think at least DESeq2 will use the dispersion estimate for the replicated group on the unreplicated group. Therefore, given you assume that the dispersion in this group is representative for the second group you might simply try running it. If for example A is a cancer sample and B is normal, then the dispersion in A is probably much larger than in B. In that case you would overestimate dispersion for B, so get fewer differential genes than there actually are. This might be ok as it at least avoids false positives (depends on your scientific question). Vice versa, if B was cancer you would probably strongly underestimate dispersion and get a lot of false-positives. What are these samples? Decide for yourself if the above assumption holds true for your data.

Edit: See here the statements of the DESeq2 and edgeR people on that matter: https://support.bioconductor.org/p/63585/

ADD COMMENT
0
Entering edit mode

Thanks for your detailed explanation and I learn a lot. In my study, A is the patient with trisomy 21 and B is normal. I am confused about why the dispersion in A is probably much larger than in B if A is a cancer sample, may you explain it more? Thanks very much.

ADD REPLY

Login before adding your answer.

Traffic: 1809 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6