Current sources for pathway data/gene sets
2
0
Entering edit mode
10.3 years ago
Lou ▴ 10

Hi All,

Does anyone know of any good resource for getting current Kegg, Biocarta and Reactome pathway/gene set data in .gmt format ? I have used the MsigDB C2 gene sets in the past but these are now quite outdated. Any advice would be much appreciated.

Thanks,

MsigDB Gene-set Biocarta Kegg Pathway • 5.9k views
ADD COMMENT
2
Entering edit mode
10.3 years ago
B. Arman Aksoy ★ 1.2k

It doesn't have KEGG and Biocarta, but you can download the GSEA filesfor various data sources from the latest Pathway Commons 2 web service:

http://www.pathwaycommons.org/pc2/downloads.html

These files contain UniProt IDs, though. You might need to map these back to some other type of id depending on your need; but PC2 also has those mappings for you:

http://www.pathwaycommons.org/pc2/downloads/ambiguous_id_mapping.UNIPROT.txt

ADD COMMENT
0
Entering edit mode

Hi Arman,

Thanks for the tip, that seems fairly straightforward to implement, will give it a try today.

ADD REPLY
1
Entering edit mode
10.3 years ago

Is this what you mean by .gmt format?

That is essentially just a list of genes for each pathway. Since most resources allow you to download such a list of genes per pathways as a flat file you could easily create those yourself.

Update. It turns out we actually have a .gmt file for the whole WikiPathways collection. We now made it available here.

ADD COMMENT
0
Entering edit mode

Hi Chris,

Thanks for your reply. Yes, that is exactly what I meant!

The main reason I am reluctant to make my own gene sets is I am unsure what kind of addition filters I should apply to each flat file (i.e what gene sets should I exclude based on different evidence codes/other properties). For this reason I thought it would be far simpler to just use pre-compiled gene sets that use agreed upon standards. However if I can't find anything like this then I will definitely think about making my own.

ADD REPLY
1
Entering edit mode

Yes! The choice of what pathways to use really is very relevant. Online collections should make clear about what they use by adding provenance data to the sets. But that is seldom done. For WikiPathways we have a "curated" collection that is tagged and that can be downloaded as such from the download collection at the PathVisio website.

ADD REPLY
0
Entering edit mode

Thanks again for your help. Just to check are you referring to the wikipathways_Homo_sapiens_Curation-AnalysisCollection__gpml zip file from Pathvisio?

From looking at the pathway files within this folder I can see that I could write a script that converts each file into the GSEA gmt format. However I need to know which database each pathway was originally derived from (e.g Kegg/Biocarta) which is not possible to unless you copy and paste the pathway's URL into your browser. This means I cannot quickly bin pathways from different database into separate groups (which is something I want to do as I would like to carry out separate enrichment analyses using gene sets/pathways from different databases).

I am rather new to bioinformatics so I am not too well acquainted with the flat file format though...

ADD REPLY
1
Entering edit mode

Yes, conversion from the gpml format would be one way to do this. There are different (easier) ways to export that list though. You could open the pathways in PathVisio which has an export option. or you could download them directly from WikiPathways after selecting the same pathway again with the correct format selected. Alternatively you could use the WikiPathways webservice or the SPARQL endpoint (which is probably most powerful).

Concerning your question about pathway collections. Most WikiPathways pathways are actually original and were created on the wiki itself or by the original GenMAPP project that preceded it. Some were indeed converted and important collections are NetPath and Reactome. These have their own portal which you could use to select them. I don't think we actually have a lot of Biocarta pathways. KEGG is special because of the licensing problems. I would not advise you to get these from WikiPathways! Although we do have some pathways that are based on KEGG pathways, but these were really extended with newer information.

ADD REPLY
0
Entering edit mode

Thanks again, I want to use the entire list of pathways so guess I will just read the files into R and edit them in a short loop. Pathviseo will be useful for post-hoc for visualizing any significant pathways that pop up though! Yes, I just found out about the Kegg license so I will probably give these a miss since I doubt my institute wants to buy a license. For any others with the same question, up to date GO pathways in gmt format can be obtained via the GO2MSIG tool.

ADD REPLY

Login before adding your answer.

Traffic: 1562 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6