Question: Finding a list of genes contained in a given pathway
1
gravatar for erikfas
4.7 years ago by
erikfas20
Sweden
erikfas20 wrote:

I'm currently doing RNA-seq, and one of the things I would like to do is to see what genes are differentially expressed in different pathways related to the EGFR network. The pathways I'm especially interested in are the MAPK/ERK, PI3K/AKT and JAK/STAT pathways. The problem is, for me, how do I define these pathways? What I would like to do is to get a gene list of all the genes in each of the pathways, so I can say "according to <source(s)>, the JAK/STAT pathway consists of <list of genes>". The idea is then to see how the different pathways differ in terms of gene expression between my various experimental parameters, i.e. taking the analysis down from a global scale to a pathway scale.

Getting such gene lists was, apparently, all too easy, since there are so many different sources to choose from - which is the problem. I've looked at some other questions here at BioStar (A: How To Get Snps Matrix For Population Genetic Ananlysis From Snps Variant Files, Gene Pathway Association File For Kegg, Extracting List Of Genes Associated With A Pathway In Kegg) and checked some of the web-based tools to do it (KEGG, GSEA). I now have quite a few lists of the three pathways to choose from, but they are quite different. I found lists from Biocarta, KEGG, GO annotation and PID, but not a single source had genes for all three pathways, at least not that I could find.

How would one go about solving this? Just picking one of the lists at random or arbitrarily seems... iffy. I'm sure I'm not the first one to have this issue. How have you / would you solve it? Thanks in advance!

ADD COMMENTlink modified 4.7 years ago by Neilfws48k • written 4.7 years ago by erikfas20

The reasons for finding somewhat different lists is due to how broadly you want to both how broadly you want to define the pathway and what cell-type you're using in your definition. In the former case, keep in mind that pathways like this are artificial constructs and could actually include the entire all genes if you really wanted. In the latter case, it should be apparent that upstream regulators of things like MAPK are going to be completely different across cell-types.

ADD REPLYlink written 4.7 years ago by Devon Ryan89k

Very true, thanks for pointing that out. I'm mainly interested in colorectal cancer and a few related cell lines (HCT-116, RKO, CACO-2, HKE3). I would prefer "smaller" pathways (i.e. somewhere around 20-50 genes). As far as I could see in my so far limited time with the various tools, there are no way to search for "colorectal cancer" in the same way that you can search for "homo sapiens", or am I missing it?

ADD REPLYlink written 4.7 years ago by erikfas20

I'm not surprised that you can't find anything specific to colorectal cancer. Most of the databases won't explicitly state what the source cell-type is for a given piece of information. Yes, this makes pretty much any solution aside from going through the literature less than ideal. You might just compare the lists from a a number of sources and just include genes mentioned at least X times.

ADD REPLYlink written 4.7 years ago by Devon Ryan89k

Ah... too bad. Do you have a favourite tool? Is there a tool that is purely based on literature, or do all of them have literature mining / computer-based backgrounds?

ADD REPLYlink written 4.7 years ago by erikfas20

I don't have a favourite tool/database, unfortunately. The best one out their is probably IPA, from Ingenuity, but that's a commercial package. I only mention that as the likely best one since it incorporates manual curation by their staff. That's mostly for direct pathway analysis, though, and I don't know if you can get direct access to the underlying database for your needs.

ADD REPLYlink written 4.7 years ago by Devon Ryan89k

Damn. Well, it would seem that the best way to do what I need is to just pick one of the pathways for whatever reason, state it, and then just go with it. Or do you have another idea?

ADD REPLYlink written 4.7 years ago by erikfas20
3
gravatar for Neilfws
4.7 years ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

The TogoWS REST service comes up in some of the answers that you have already found at this site.

KEGG does contain a colorectal cancer pathway and this URI will retrieve the associated genes:

curl http://togows.dbcls.jp/entry/pathway/hsa05210/genes.json

Then it's a case of processing the JSON using your language of choice.

ADD COMMENTlink written 4.7 years ago by Neilfws48k

Thanks! I did check that quickly before, actually, but since the gene lists that I get from using the URI is so much larger than the one I can see in the corresponding KEGG pathway (this http://www.genome.jp/kegg-bin/show_pathway?hsa04010 can't possibly have about 250 genes in it) I kind of gave up on it, just before asking this question. Why the discrepancy?

ADD REPLYlink written 4.7 years ago by erikfas20

Hi Erik to be clear I cannot see any discrepancy here.

The gene list which u get from

URI : 257 genes

http://www.genome.jp/dbget-bin/www_bget?hsa04010 

TogoWS REST: 257 genes

http://togows.dbcls.jp/entry/pathway/hsa04010/genes.json

And

GeneSCF annotation: 260 genes

https://github.com/santhilalsubhash/geneSCF/blob/master/annotation/KEGG_pathway_updated130711_geneID.txt 

https://github.com/santhilalsubhash/geneSCF/blob/master/annotation/KEGG_pathway_updated130711_geneSym.txt 

 

ADD REPLYlink modified 4.7 years ago • written 4.7 years ago by EagleEye6.2k

Yeah, the lists are all fine, but I mean in the actual KEGG pathway image. Maybe I just don't understand how they're drawn, but the MAPK pathway image has a little over a 100 genes in it (link in my previous comment).

ADD REPLYlink written 4.7 years ago by erikfas20
0
gravatar for EagleEye
4.7 years ago by
EagleEye6.2k
Sweden
EagleEye6.2k wrote:
A: Gene Set Clustering based on Functional annotation (GeneSCF) You can download this tool and it has database as plain text format which contains KEGG, REACTOME and geneontology with their corresponding genes ( EntrezID and also Genesymbols). Let me know if you need any help.
ADD COMMENTlink written 4.7 years ago by EagleEye6.2k

Thanks, but I'm not sure that this does what I'm looking for. If I understand your tool correctly, you give it a list of genes and it finds the enrichment/clustering of said genes in various pathways - correct? This is kind of the opposite of what I want, i.e. get a list of genes from a given pathway.

ADD REPLYlink written 4.7 years ago by erikfas20

Yes but you do not have to use this tool .... There will be annotation folder in the tool. Where you will have All pathways with Gene list (Tab-separated format). It will have all the genes related to the pathways.

For example you will have  PI3K-Akt signaling pathway with all related 347 genes listed. Just simple grepping on files will do - grep "PI3K-Akt signaling pathway".

 

https://github.com/santhilalsubhash/geneSCF/blob/master/annotation/KEGG_pathway_updated130711_geneSym.txt

 

https://github.com/santhilalsubhash/geneSCF/blob/master/annotation/KEGG_pathway_updated130711_geneID.txt

ADD REPLYlink modified 4.7 years ago • written 4.7 years ago by EagleEye6.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 918 users visited in the last hour