How To Map Protein Names From Ensembl!Metazoa To Jgi Names
Entering edit mode
11.0 years ago
Dror ▴ 280

I want to map all the protein names form orthoMCL5 groups to Ensembl. I need it for basal metazoa such as Nematostella and Trichoplax. However the names in orthoMCL are from the JGI database and do not match the ENSEMBL names - how to do it? - I am sure that the ENSEMBL protein are derived from the JGI databases - so How did they translated the names? It will be better to get a code in Python which is my favorite language, but other solutions will be great.

orthomcl ensembl python mapping • 2.5k views
Entering edit mode
11.0 years ago

Ensembl help-desk is usually really efficient at answering questions such as this:

In your case have a look the output of biomart from ensembl metazoa Include the attributes "Associated Gene DB" as external ids are often stored here.

Comparing known genes in both databases should show you commonalities in naming convention.

In the case of Nematostella you get:

ensembl id of NEMVEDRAFT_v1g222669 and Associated Gene DB =v1g222669.
Looking at the corresponding gene in JGI you get an id of = 222669. I would assume therefore that simply removing everything before v1g should give you the JGI id.

and for Trichoplax

TriadG55444;TriadT55444; TRIADDRAFT_55444 = JGI number: 55444

Entering edit mode

This is correct since Ensembl Metazoa attempts to give all classes of objects a unique identifier. When you have IDs like fgenesh1_pg.scaffold_688000001, an identifier assigned by FGENSH, the chances of clashing IDs is very high. You will see this pattern in all Ensembl Genomes databases where Ensembl Genomes has not taken responsibility for the generation of stable ids.


Login before adding your answer.

Traffic: 1269 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6