Question

Ensembl exon-to-dna mapping

0

Entering edit mode

9.6 years ago

jgbradley1 ▴ 110

Hopefully this question isn't too specific. I am using the latest release of the human genome in the Ensembl database (homo_sapiens_core_76_38). I would like to map exons to their dna sequence. The database schema seems to indicate that I can take the seq_region_id from the exon table and use that to reference the dna table. However there isn't a dna sequence for every exon. For example, the exon with exon_id=28550800, it's corresponding seq_region_id does not exist in the dna table. This is my first time using Ensembl, so is there something I'm missing?

dna exon ensembl • 2.8k views

ADD COMMENT • link updated 2.1 years ago by Ram 43k • written 9.6 years ago by jgbradley1 ▴ 110

0

Entering edit mode

Is there a reason you're not just using biomart (that's a query for the exonic sequences of each annotated human exon from release 76)?

ADD REPLY • link updated 2.1 years ago by Ram 43k • written 9.6 years ago by Devon Ryan 104k

0

Entering edit mode

So your approach of using biomart will work. It still doesn't solve my problem of how the seq_region_id from the exon table maps to the seq_region_id in the dna table. Although they have the same name, they aren't the same in the database. Just did a sql join between the dna table and exon table based on seq_region_id and it shows that there is no relation between the two tables.

ADD REPLY • link 9.6 years ago by jgbradley1 ▴ 110

0

Entering edit mode

9.3 years ago

Tariq Daouda ▴ 220

Hi,

I wrote a python module for this kind of queries on Ensembl data, it's called pyGeno and it is freely available on github: https://github.com/tariqdaouda/pyGeno

Once you've imported the genome into it you can simply do:

from pyGeno.Genome import *

ref = Genome(name = "GRCh7.75")
exon = ref.get(Exon, id = "EN...")[0]

print exon.CDS
print exon.sequence

Hope that helps

ADD COMMENT • link updated 2.1 years ago by Ram 43k • written 9.3 years ago by Tariq Daouda ▴ 220

Ram · Accepted Answer · 2014-10-10

Magali answered this on the Ensembl dev list as follows:

Exons and other features tend to be stored on toplevel sequences, which are generally chromosomes. Dna sequence however is stored on the contig level. The assembly table contains information to map a contig sequence to a chromosome.

Retrieving dna sequence directly from the mysql schema is tricky in the best of case. This is why we recommend using Biomart, the perl API or REST queries for this type of use.