Ensembl Exon Phase Notation
1
2
Entering edit mode
11.2 years ago
HAlit ▴ 20

Hey,

I would like to retrieve some exon sequences, translate them to amino acid sequences and then blast against some proteome.

I am working with exon sequences from Ensembl.

Ensembl uses something called phase to note the codons interrupted by introns, as follows:

Let N denote codon base that belongs to our exon of interest, # trailing codon bases in the same exon, and x intron.

Exon start phase: 0 - no interruption. NNNxxxxxxNNN###NNN###

Exon start phase: 1 - first codon's first base is in the previous exon. NxxxxxxxNN###NNN###NNN

Exon start phase: 2 - first codon's first two bases are in the previous exon. NNxxxxxxxxN##NNN###NNN

In addition to start phase, there is also an end phase, which works similarly.

Exon end phase:1 - last codon's last base is in the next exon. NNN###NNxxxxxxxN

Exon end phase:2 - last codon's last two bases are in the next exon. NNN###NxxxxxxNN

I assume these descriptions are correct - please let me know if they are not.

I downloaded phase information using BioMart to later map them back to the exon sequences and remove these interrupted codons. The problem is that BioMart provides single phase information, which I guess is the start phase. Does anyone know why the end phase is missing?

Thank you

ensembl exon • 6.6k views
ADD COMMENT
0
Entering edit mode

sorry I think you are wrong- as you can see the end phase of one exon and start phase of next exon should be in frame (meaning same phase), your explanation would not justify that.

The position of an exon/intron boundary within a codon. A phase of zero means the boundary falls between codons, one means between the first and second base and two means between the second and third base. Exons have a start and end phase, whereas introns have just one phase. A boundary in a non-coding region has a phase of -1.

ADD REPLY
0
Entering edit mode
11.2 years ago
Matt LaFave ▴ 310

You're correct in your interpretation of the descriptions, and in assuming that the "phase" listed in BioMart is the start phase (You'll also run into a phase labeled -1, which I believe means that the start of that exon is non-coding).

I'm not entirely sure why end phase is missing, but if I had to guess, it's because it's essentially redundant information. If, for whatever reason, you find that you need to have both the start and end phase of a given exon, you can determine the end phase by looking at the start phase of the downstream exon. Hope the helps!

ADD COMMENT
0
Entering edit mode

Thanks Matt. Indeed, I do realize that it's essentially redundant information provided the phase of the downstream exon. However, since I am interested in performing an exon-centric analysis, I would be happy to carry out my objective without bothering with the downstream exon. I guess it is just missing - strange.

Regarding -1, you are right, it's for UTR exons.

ADD REPLY

Login before adding your answer.

Traffic: 2942 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6