Question: Ensembl Exon Phase Notation
7.4 years ago
HAlit20 wrote:


I would like to retrieve some exon sequences, translate them to amino acid sequences and then blast against some proteome.

I am working with exon sequences from Ensembl.

Ensembl uses something called phase to note the codons interrupted by introns, as follows:

Let N denote codon base that belongs to our exon of interest, # trailing codon bases in the same exon, and x intron.

Exon start phase: 0 - no interruption. NNNxxxxxxNNN###NNN###

Exon start phase: 1 - first codon's first base is in the previous exon. NxxxxxxxNN###NNN###NNN

Exon start phase: 2 - first codon's first two bases are in the previous exon. NNxxxxxxxxN##NNN###NNN

In addition to start phase, there is also an end phase, which works similarly.

Exon end phase:1 - last codon's last base is in the next exon. NNN###NNxxxxxxxN

Exon end phase:2 - last codon's last two bases are in the next exon. NNN###NxxxxxxNN

I assume these descriptions are correct - please let me know if they are not.

I downloaded phase information using BioMart to later map them back to the exon sequences and remove these interrupted codons. The problem is that BioMart provides single phase information, which I guess is the start phase. Does anyone know why the end phase is missing?

Thank you

ensembl exon • 3.9k views
ensembl exon • 3.9k views
7.4 years ago
Matt LaFave290
San Diego, CA
Matt LaFave290 wrote:

You're correct in your interpretation of the descriptions, and in assuming that the "phase" listed in BioMart is the start phase (You'll also run into a phase labeled -1, which I believe means that the start of that exon is non-coding).

I'm not entirely sure why end phase is missing, but if I had to guess, it's because it's essentially redundant information. If, for whatever reason, you find that you need to have both the start and end phase of a given exon, you can determine the end phase by looking at the start phase of the downstream exon. Hope the helps!

7.4 years ago by Matt LaFave290

Thanks Matt. Indeed, I do realize that it's essentially redundant information provided the phase of the downstream exon. However, since I am interested in performing an exon-centric analysis, I would be happy to carry out my objective without bothering with the downstream exon. I guess it is just missing - strange.

Regarding -1, you are right, it's for UTR exons.

7.4 years ago by HAlit20
