I'm thinking about using DAVID for GO-term enrichment analysis in a set of DE genes from RNAseq. The thing is all the references I've found so far about using DAVID are for microarray data and some of them for CHIPseq data,
Besides, in the paper describing GOseq, they point out that
standard methods give biased results on RNA-seq data due to over-detection of differential expression for long and highly expressed transcripts.
so it makes me wonder if DAVID is really appropriate, have you used it for RNAseq?
Hi, I was thinking of using DAVID software with RNAseq data BUT only for selecting the common DEGs that appear in different RNAseq experiments with different samples.
I mean, I have RNAseq data from the comparison of two samples, and another RNAseq data from the comparison of two different samples. And I want to know the common DEGs of both RNAseq experiments, without taking into account their functional characteristics. Do you know if DAVID will solve my problem? Or if there is another software to do that?
I only want to know which genes are differentially expressed commonly in all the comparisons.
I was going to cite this paper for further exploration, but upon a quick re-skim, it seems they (sadly) didn't do any analysis on the tag count vs. ability to call differential expression. Still useful to peruse, though ...
Some navel gazing in hopes of being a bit more thorough: if I recall correctly, the problem is the bias in RNA-seq to call differential expression and its relations tip transcript length. If you are using an "Tag-sequencing" method for gene expression analysis which doesn't have this bias (ie. SAGEseq, deepCAGE, or similar), "normal" GO analysis downstream of appropriate differential expression calls should suffice, no?
This one might be closer? http://genomebiology.com/2010/11/10/R106. More analytically, think of the counts as poisson-distributed (they are not, but the approximation is not too far off for low-counts), so the mean and variance are equal. As the number of counts increases, the distribution becomes tighter. I'm no statistician, but hopefully the point is coming across.
Highly-expressed genes are more likely to be called differentially-expressed than low-expressed genes, also. This effect is independent of transcript length bias, so you are still not on firm ground with SAGE or CAGE. I do not know how biased results will be in practice, though.
Thanks a lot for the comments!