Hi blog how to calculate differntial expression analysis if thre is no replicates in count data using edgeR?
Entering edit mode
8.8 years ago

Hey I followed this tutorial for DE analysis using edgeR but I couldn't plot graphs.


for my data of 20812 gene counts

Gene    Control    CASE_HYD004
SEC24B-AS1    5    16
A1BG    0    3
A1CF    0    0
GGACT    2    0
A2M    8    1572
A2ML1    0    0
A2MP1    2    3
A4GALT    3    71
A4GNT    0    0
AAAS    79    60
AACS    66    567

It's getting error like no replicates, in this case how to calculate dispersion using edgeR.

Kindly need valuable suggestions if there is any other accurate tool other than edgeR

edgeR • 3.5k views
Entering edit mode

It is a pretty bad idea to work without biological replicates when doing differential gene expression analysis, as it is difficult to account for biological variability. The edgeR authors also explicitely advise against working without replicates. Nevertheless, there is a section on this topic in the edgeR user guide (page 19). Reading that section might be more useful than the normal tutorials, that assume replication.

You can also use the R packages DESeq or DESeq2 for the same purpose (although slightly different mehodically), but the authors of that tool also strongly advise against using unreplicated data.

Entering edit mode
8.8 years ago
h.mon 35k

As utzermel said, you should not infer DE without biological replicates - this paper and this tech note provide clear reasoning and examples why not.

That said, funding reality oftentimes trumps best practices. The Trinity folks wrapped a script and tutorial to analyze DE even without replication, the caveat is I believe it will work only with a transcriptome assembled by Trinity - you should be able to read the script and adapt to your case.

Entering edit mode
8.8 years ago
Martombo ★ 3.1k

Since you only have one sample, it's quite evident that you cannot compute a value for the dispersion. If you use deseq or deseq2, the program will anyway compute a "mock" dispersion for every gene, by considering the two samples as replicate of the same condition. This follows the assumption that most genes are not differentially expressed. In this way you'll anyway be able to fit a dispersion to mean trend and produce a final moderate estimation of the dispersion of every gene. This dispersion is likely to be a little overestimated, but it will allow you to model your data on a negative binomial distribution. Given the weak statistical setup you won't probably get any significant DE gene, but it could be informative to have a look at the top ones.

Entering edit mode
8.8 years ago
scchess ▴ 640

You'll need replicates to estimate a negative-binomial model which outputs the dispersion among replicates. Basically, dispersion measures your biological replicates and technical replicates.

You could also assume that your biological replicates can be ignored (not a good assumption). Mock a few values for each of the condition. For example:

A4GALT    3

can be mocked into

A4GALT    3  2 4

2 and 4 are assumed be replicates that you'll get if you did the experiment.


Login before adding your answer.

Traffic: 2726 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6