Extracting List Of Genes Associated With A Pathway in a plain file
1
1
Entering edit mode
7.7 years ago
moonlhy2010 ▴ 10

I have downloaded a file which contains all the pathways of human. The structure of the file looks like this: What I want to do is to extract each pathway and all the genes in this pathway. I know this could be down with perl or python. But I don't know how to do this.

#ENTRY       hsa00001;
#NAME        T01001;
#DEFINITION  KEGG Orthology (KO) - Homo sapiens (human);
#--->;
!;
A<b>Metabolism</b>;
B;
B  <b>Overview</b>;
C    01200 Carbon metabolism [PATH:hsa01200];
D      3101 HK3; hexokinase 3";"K00844 HK; hexokinase [EC:2.7.1.1]
D      3098 HK1; hexokinase 1";"K00844 HK; hexokinase [EC:2.7.1.1]
D      3099 HK2; hexokinase 2";"K00844 HK; hexokinase [EC:2.7.1.1]
D      80201 HKDC1; hexokinase domain containing 1";"K00844 HK; hexokinase [EC:2.7.1.1]
D      2645 GCK; glucokinase";"K12407 GCK; glucokinase [EC:2.7.1.2]
D      83440 ADPGK; ADP dependent glucokinase";"K08074 ADPGK; ADP-dependent glucokinase [EC:2.7.1.147]
D      2821 GPI; glucose-6-phosphate isomerase";"K01810 GPI; glucose-6-phosphate isomerase [EC:5.3.1.9]
D      5213 PFKM; phosphofructokinase, muscle";"K00850 pfkA; 6-phosphofructokinase 1 [EC:2.7.1.11]
D      5214 PFKP; phosphofructokinase, platelet";"K00850 pfkA; 6-phosphofructokinase 1 [EC:2.7.1.11]
D      5211 PFKL; phosphofructokinase, liver type";"K00850 pfkA; 6-phosphofructokinase 1 [EC:2.7.1.11]
gene • 1.8k views
ADD COMMENT
2
Entering edit mode
7.7 years ago
EagleEye 7.6k

You can download pathways/GeneOntology and associated genes using GeneSCF as plain text file.

Example: For Homo sapiens [KEGG organism code: hsa]

./prepare_database -db=KEGG -org=hsa

The above command downloads complete Human KEGG database as simple text file in following location, 'geneSCF-tool/class/lib/db/hsa/'

Other example: A: How to look up GO terms associated to a certain organism?

I found similar question from you.

ADD COMMENT
0
Entering edit mode

Yes, this works. Here I get all the kegg ids. But I need the gene symbols.

ADD REPLY
1
Entering edit mode

Since some Entrez geneid has multiple gene symbols (I did not include GeneSymbol retrieval for KEGG), you can convert Entrez Ids from 'KEGG_database.txt' you got from GeneSCF to genesymbols using the information from KEGG.

Example for human use http://rest.kegg.jp/list/hsa

Note: You can make use of simple script from GeneSCF 'geneSCF_tool/class/scripts/mappingIDS.pl' to map the ids.

perl geneSCF_tool/class/scripts/mappingIDS.pl KEGG_database.txt IDsWithSym.txt > KEGG_database_sym.txt
  

Format: IDsWithSym.txt (EntrezID<tab>SingleGeneSymbol) prepare from above mentioned link and as shown below (Which symbols to keep ?? it is hard decision to make)

7325<TAB>UBE2E2
5025<TAB>BRE1
  

For other organisms use http://rest.kegg.jp/list/[KEGG_organism_codes]

ADD REPLY
0
Entering edit mode

It is simple to download all the gene ids using geneSCF. But as you said it is hard to decide which symbols to use for mapping the ids with gene symbols. I would prefer trying to extract the gene symbols from the file that I have. Thanks anyway.

ADD REPLY

Login before adding your answer.

Traffic: 1345 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6