It is a pretty bad idea to work without biological replicates when doing differential gene expression analysis, as it is difficult to account for biological variability. The edgeR authors also explicitely advise against working without replicates. Nevertheless, there is a section on this topic in the edgeR user guide (page 19). Reading that section might be more useful than the normal tutorials, that assume replication.
You can also use the R packages DESeq or DESeq2 for the same purpose (although slightly different mehodically), but the authors of that tool also strongly advise against using unreplicated data.
ADD REPLY
• link
updated 20 months ago by
Ram
44k
•
written 9.3 years ago by
utzermel
▴
120
As utzermel said, you should not infer DE without biological replicates - this paper and this tech note provide clear reasoning and examples why not.
That said, funding reality oftentimes trumps best practices. The Trinity folks wrapped a script and tutorial to analyze DE even without replication, the caveat is I believe it will work only with a transcriptome assembled by Trinity - you should be able to read the script and adapt to your case.
Since you only have one sample, it's quite evident that you cannot compute a value for the dispersion. If you use deseq or deseq2, the program will anyway compute a "mock" dispersion for every gene, by considering the two samples as replicate of the same condition. This follows the assumption that most genes are not differentially expressed. In this way you'll anyway be able to fit a dispersion to mean trend and produce a final moderate estimation of the dispersion of every gene. This dispersion is likely to be a little overestimated, but it will allow you to model your data on a negative binomial distribution. Given the weak statistical setup you won't probably get any significant DE gene, but it could be informative to have a look at the top ones.
You'll need replicates to estimate a negative-binomial model which outputs the dispersion among replicates. Basically, dispersion measures your biological replicates and technical replicates.
You could also assume that your biological replicates can be ignored (not a good assumption). Mock a few values for each of the condition. For example:
A4GALT 3
can be mocked into
A4GALT 3 2 4
2 and 4 are assumed be replicates that you'll get if you did the experiment.
It is a pretty bad idea to work without biological replicates when doing differential gene expression analysis, as it is difficult to account for biological variability. The edgeR authors also explicitely advise against working without replicates. Nevertheless, there is a section on this topic in the edgeR user guide (page 19). Reading that section might be more useful than the normal tutorials, that assume replication.
You can also use the R packages DESeq or DESeq2 for the same purpose (although slightly different mehodically), but the authors of that tool also strongly advise against using unreplicated data.