REVIGO equivalent for Reactome/KEGG/WikiPathways
1
2
Entering edit mode
23 months ago

Hello everyone! I've been at Biostars for a while, finding tons of valuable info. This will be my first question.

Imagine that you have a long list of GO terms from enrichment analysis. It's hard to see the bigger picture and even harder to present it. You can use REVIGO or GOSemSim or simplify from ClusterProfiler to reduce some of that redundancy by exploiting the parent-child relationships between terms.

Is there a way to do a similar thing with a list of pathways from: 1. Reactome 2. KEGG 3. WikiPathways?

I've spent several days trying to find a suitable tool or easy R-code solution, but either there's none or I'm overlooking something obvious. Thanks!

pathway analysis Reactome KEGG WikiPathways • 1.0k views
1
Entering edit mode

For Reactome and KEGG, I have tried http://bioinformatics.cing.ac.cy/PathwayConnector, which has few clustering algorithms, but the visualization is messy with a large number (100 - 150) of pathways. Also, this tool adds some additional "connecting" pathways to the list, which is not desired and uses 2016 versions of the databases.

I also looked at ClusterProfiler, Pathview, ReactomeFIViz, PathfindR and Reactome/KEGG websites, but none of them seems to provide the functionality that I'm looking for.

1
Entering edit mode
4 months ago

Hi Maciej - I have a similar problem where I am trying to draw a network diagram of enriched Reactome pathways (results generated by ReactomePA). There are a large number of enriched pathways clustering together closely, and I suspect pathway redundancy. But how to identify and remove redundant pathways if the simplify function doesn't work?

I'm a beginner with R but since there are no other answers I will share my partial solution in the hope that somebody else may respond with a better answer!

let 'gsea' be the GSEA result file output from ReactomePA. Also using enrichplot 1.11.1.992.

Generate 'similarities' - a table of the edge similarities in the network

similarities <- as.data.frame((pairwise_termsim(gsea)@termsim))


Replace NA values with 0.

similarities[is.na(similarities)] <- 0


Transpose table. The most enriched pathways should now be near the top rows, and all values in their rows should be 0. Further down you may start to see rows with higher edge values.

similarities <- t(similarities)


Make a logical 'redundant' table which will tell you which interactions are above a certain threshold (I used 0.8). This is based on information from https://stackoverflow.com/questions/43667495/how-delete-all-rows-that-contain-a-certain-value-regardless-of-what-column-it-is.

redundant <- similarities>0.8


Define which pathways are redundant based on the presence of any 'TRUE' values. Because the original table only includes one comparison between each set of two pathways, you will be identifying as redundant only the less significantly enriched pathway in a redundant pair.

redundant <- rowSums(redundant)
redundant<-redundant[redundant>0]
redundant <- as.data.frame(redundant)
redundant <- as.vector(row.names(redundant))


Define nonredundant pathways as all pathways in the original gsea minus the identified redundant pathways.

nonredundant <- row.names(similarities)[!row.names(similarities) %in% redundant]


This vector of nonredundant enriched pathways can now be used with the showCategory function in enrichplot package, which will graph only the nonredundant enriched pathways.