Question: Ensembl exon-to-dna mapping
gravatar for jgbradley1
6.1 years ago by
United States
jgbradley1100 wrote:

Hopefully this question isn't too specific. I am using the latest release of the human genome in the Ensembl database (homo_sapiens_core_76_38). I would like to map exons to their dna sequence. The database schema seems to indicate that I can take the seq_region_id from the exon table and use that to reference the dna table. However there isn't a dna sequence for every exon. For example, the exon with exon_id=28550800, it's corresponding seq_region_id does not exist in the dna table. This is my first time using Ensembl, so is there something I'm missing?

dna exon ensembl • 1.8k views
ADD COMMENTlink modified 5.8 years ago by Tariq Daouda210 • written 6.1 years ago by jgbradley1100

Is there a reason you're not just using biomart (that's a query for the exonic sequences of each annotated human exon from release 76)?

ADD REPLYlink written 6.1 years ago by Devon Ryan97k

So your approach of using biomart will work. It still doesn't solve my problem of how the seq_region_id from the exon table maps to the seq_region_id in the dna table. Although they have the same name, they aren't the same in the database. Just did a sql join between the dna table and exon table based on seq_region_id and it shows that there is no relation between the two tables.

ADD REPLYlink written 6.1 years ago by jgbradley1100
gravatar for Emily_Ensembl
6.0 years ago by
Emily_Ensembl21k wrote:

Magali answered this on the Ensembl dev list as follows:

Exons and other features tend to be stored on toplevel sequences, which are generally chromosomes.
Dna sequence however is stored on the contig level.
The assembly table contains information to map a contig sequence to a chromosome.

Retrieving dna sequence directly from the mysql schema is tricky in the best of case.
This is why we recommend using Biomart, the perl API ( or REST queries ( for this type of use.

ADD COMMENTlink written 6.0 years ago by Emily_Ensembl21k
gravatar for Tariq Daouda
5.8 years ago by
Tariq Daouda210
IRIC | Institute for Research in Immunology and Cancer
Tariq Daouda210 wrote:


I wrote a python module for this kind of queries on Ensembl data, it's called pyGeno and it is freely available on github:

Once you've imported the genome into it you can simply do:

from pyGeno.Genome import *

ref = Genome(name = "GRCh7.75")

exon = ref.get(Exon, id = "EN...")[0]

print exon.CDS
print exon.sequence

Hope that helps,


ADD COMMENTlink written 5.8 years ago by Tariq Daouda210
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1396 users visited in the last hour