Go Term Mapping
4
3
Entering edit mode
10.9 years ago
ric ▴ 100

Hi, I would like to use https://github.com/tanghaibao/goatools . For that I need a association file which contains the gene to GO term mapping which is a two-column tabular file, first column - gene, second column - go terms (separated by ; if there are multiple terms).

Where it is possible to download such such file for: * Arabidopsis * Oryza sativa

Thank you in advance.

go • 8.0k views
ADD COMMENT
3
Entering edit mode
10.9 years ago

There are a lot of GO enrichment tools out there - do you need to use this particular package?

FuncAssociate works for Oryza sativa: http://llama.mshri.on.ca/funcassociate/

GOrilla works for Arabidopsis: http://cbl-gorilla.cs.technion.ac.il/

agriGO works with both species: http://bioinfo.cau.edu.cn/agriGO/

In general, GO provides a list of several possible tools: http://www.geneontology.org/GO.tools_by_type.term_enrichment.shtml

ADD COMMENT
2
Entering edit mode
10.9 years ago

see GO-Annotations / downloads: http://www.ebi.ac.uk/GOA/downloads

ADD COMMENT
1
Entering edit mode
10.9 years ago
ric ▴ 100

Thank you for the links. I would be also interested in a generic way, because I work also on completely new organisms. So I downloaded the following files

ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/idmapping/idmapping.dat.gz
ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/gene_association.goa_uniprot.gz
ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/gp_association.goa_uniprot.gz
ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/gp_information.goa_uniprot.gz

Than I tried to find i.e. the following gene ID Q8L9A8

$ grep Q8L9A8 gene_association.goa_uniprot
$ grep Q8L9A8 gp_association.goa_uniprot
$ grep Q8L9A8 gp_information.goa_uniprot
Q8L9A8  Q8L9A8  Putative uncharacterized protein        Q8L9A8_ARATH    protein taxon:3702                      db_subset=TrEMBL
$ grep Q8L9A8 idmapping.dat
Q8L9A8  UniProtKB-ID    Q8L9A8_ARATH
Q8L9A8  GI      21595188
Q8L9A8  UniRef100       UniRef100_Q8L9A8
Q8L9A8  UniRef90        UniRef90_Q8L9A8
Q8L9A8  UniRef50        UniRef50_Q9LZ47
Q8L9A8  UniParc UPI000009F1A3
Q8L9A8  EMBL    AY088547
Q8L9A8  EMBL-CDS        AAM66079.1
Q8L9A8  NCBI_TaxID      3702
Q8L9A8  IPI     IPI00517530
Q8L9A8  eggNOG  NOG306527
Q8L9A8  HOGENOM HOG000153184

Unfortunately, I could not see any GO term mapping to the above gene ID.

Where could I find information which Uniprot gene ID map to GO term?

ADD COMMENT
0
Entering edit mode

If you are working on a completely new organism, then there won't be any existing GO terms mapping to your gene, since no one knows what your gene is. You can generate de novo GO terms for your sequence by scanning your sequences with various HMM databases, and then mapping GO terms to the HMM database IDs (http://www.geneontology.org/GO.indices.shtml), or you can use blast2go (http://www.blast2go.com/b2ghome).

ADD REPLY
0
Entering edit mode

First of all be careful, the ID you are looking for, Q8L9A8, is a protein ID (uniprot), not a gene ID.

ADD REPLY
1
Entering edit mode
10.9 years ago
Arnaud Ceol ▴ 860

Dear Ric,

The file idmapping contains mapping from uniprot to other sequence databases (refseq, genbqnk etc.), but maybe not to GO.

The ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ repository contains the GO annotation for each species. You should be able to download the rice one and if you're using Uniprot references as you looks like, you just have to select the right columns.

If you really want the gene id, you can also go to Uniprot, search for your species (search taxonomy) and click on the Uniprot link. From here, you can customize the display to show only gene name and go term and export it in a tab-delimited file.

best,

Arnaud

ADD COMMENT

Login before adding your answer.

Traffic: 2406 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6