Question

What's your preferred pathway enrichment analysis tool after DEG analysis and why?

2

Entering edit mode

6.5 years ago

unawaz ▴ 60

So firstly, I'm completely aware that this type of question has been asked multiple times (I know this since I've been scrolling over these type of questions for the past 2 days), but I'm actually more interested in knowing the reasons as to why some people prefer

I've performed differential expression analysis using DESeq2 and I want to see which Gene ontology terms, KEGG pathway terms etc are enriched in my data set. I've initially tried using clusterProfileR, but I keep getting 3 enriched terms for all my differentially expressed genes using enrichGO(). I also know that some input in clusterProfileR requires you to put logFC values, so I wasn't sure if that was for ALL the genes analysed, or just the differentially expressed genes.

I've also used goseq but my main issue with that is the GO terms are too broad.

I also only have about 300 DEGs, so I'm not really sure if this sort of analysis is best performed when you have a myriad of DEGs, or can be done with a small number.

Anyway, looking forward to hearing people's responses :)

gene ontology RNA-Seq • 6.0k views

ADD COMMENT • link updated 6.5 years ago by Jean-Karim Heriche 27k • written 6.5 years ago by unawaz ▴ 60

0

Entering edit mode

Check out our GO_MWU: great power, no need to spit the data into DEGs / non-DEGs (the test is ranks-based so it can use any measure according to which the genes can be ranked), intuitive graphical representation of results. https://github.com/z0on/GO_MWU - Misha

ADD REPLY • link 6.4 years ago by matz ▴ 10

2

Entering edit mode

6.5 years ago

Devon Ryan 105k

Essentially every free tool is using the same set of databases and quite similar algorithms (there's a smallish set to choose from with a few tweaks here and there), so it's unsurprising that you get similar results regardless of which tool you use. To be frank, if you want different results you need to use a different database. We're pretty happy with IPA in this regard. It can be rather pricey, but if you can go in with multiple labs on a license then it becomes more feasible.

ADD COMMENT • link 6.5 years ago by Devon Ryan 105k

1

Entering edit mode

6.5 years ago

EagleEye 7.6k

I use Gene Set Clustering based on Functional annotation (GeneSCF)

ADD COMMENT • link 6.5 years ago by EagleEye 7.6k

1

Entering edit mode

6.5 years ago

Jean-Karim Heriche 27k

An alternative approach: Finding New Order in Biological Functions from the Network Structure of Gene Annotations

ADD COMMENT • link 6.5 years ago by Jean-Karim Heriche 27k

score 5 · Accepted Answer · 2019-01-10

5

Entering edit mode

6.5 years ago

i.sudbery 21k

I'm a big user of GOseq, which allows to control for the fact that longer and more highly expressed genes are more likely to be found to be differential, if long or highly expressed genes are not evenly distributed between pathways/categories, then this can bias your enriched pathways.

I also like the GSEA-like algorithms, because you do not have to set an artificial limit on what you consider significant. The version of this where you rank on significance suffers from the same gene-length bias that traditional GO tests suffer, but this may be lessened by ranking on some suitably strunken logFC metric. cameraPR from limma is a good example of this.

Finally we've used SPIA before, which is a pathway enrichment tool that takes the topology of the network into account. Its a great idea, let down by the quality of the pathway annotations it runs on.

ADD COMMENT • link 6.5 years ago by i.sudbery 21k

0

Entering edit mode

The version of this where you rank on significance suffers from the same gene-length bias that traditional GO tests suffer, but this may be lessened by ranking on some suitably strunken logFC metric. cameraPR from limma is a good example of this.

Could you expand on this? It seems that in general, camera and cameraPR get used with the limma moderated-t statistic.

For anyone else confused about shrinkage of logFC, see Gordon Smith's reply to this question regarding how logFC is shrunk towards zero by limma etc.

ADD REPLY • link 5.4 years ago by alexvpickering ▴ 60

0

Entering edit mode

Indeed, most people use camera and cameraPR with t. But I think this would still suffer from length bias. You could use the shrunken LFCs that come out of DESeq2. However, there was recently a paper suggesting that even this still suffers from length bias.

ADD REPLY • link 5.4 years ago by i.sudbery 21k