Question: How to convert exon genomic coordinates to protein coordinates
0
gravatar for carlota.rubio
3.6 years ago by
Spain
carlota.rubio0 wrote:

I am trying to convert exon start/end from genomic coordinates to protein. I have been using Ensembl-Biomart exon attribute:  Genomic coding start/end. I assume that this one refers to the start and end of the exon of only of the coding region. So if for each gene I start from exon 1 and I start counting by three from Genomic coding start I should have an exact number of aminoacids (and then I can convert easily from genomic to protein coordinates). In other words ( Genomic coding start - Genomic coding end ) should be divisible by 3. However, it is not. 

I have been trying to use CDS start/end cDNA start/end too, but same happens. Any clue on what is going on here? Thank you. 

 

myposts exon gene genome • 1.8k views
ADD COMMENTlink modified 3.6 years ago by Juke-342.0k • written 3.6 years ago by carlota.rubio0

start from exon 1 and I start counting by three from Genomic coding start I should have an exact number of aminoacids

you're wrong:  exons may contain UTRs

 

ADD REPLYlink written 3.6 years ago by Pierre Lindenbaum118k
1
gravatar for Juke-34
3.6 years ago by
Juke-342.0k
Sweden
Juke-342.0k wrote:

First of all, not all exon are coding (UTR). Some can be partially coding. So, you have to work with CDS coordinates.

Secondly, to calculate a length you must do (end-start+1).

Best

ADD COMMENTlink written 3.6 years ago by Juke-342.0k
I forgot to add the + 1 in my comment sure, sorry. About UTRs, as far as I know they are not considered coding so must be out of the Genomic coding start/end attribute..right? I have tried using CDS coordinates but same happens (i.e. GATA3 canonical transcript in ensembl v70)
ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by carlota.rubio0

What about "start - end" instead of "end - start" , did you noticed the error ?

Right, UTRs are out the genomic coding start / end. The thing is the gene has several CDS (5 in human) so you should extract the length of each of them, make a total of all the length and then divide by 3. Did you do that ?

 

ADD REPLYlink written 3.6 years ago by Juke-342.0k

The thing is I am not interested in the length of the exons in aa... I was using this measure to see if I could directly translate from exon start/end genomic coordinates to protein coordinates like (where EXON_START is the Genomic CODING exon start) this example for GATA3:

ENSG ENST EXON_START EXON_END EXON EXON_END-EXON_START)+1/3
ENSG00000107485 ENST00000379328     1  
ENSG00000107485 ENST00000379328 8097619 8097859 2 80,3333333333
ENSG00000107485 ENST00000379328 8100268 8100804 3 179
ENSG00000107485 ENST00000379328 8105956 8106101 4 48,6666666667
ENSG00000107485 ENST00000379328 8111436 8111561 5 42
ENSG00000107485 ENST00000379328 8115702 8115986 6 95

 

My idea was 8097619 is the first genomic coding position of GATA3 so 8097619-8097621 correspond to protein position 1, 8097624-8097626 correspond to protein position 2, etc. However this cannot be true if the exon length according genomic coding coordinates start and end is not divisible by 3 ...  (If you use CDS coordinates same happens)

ADD REPLYlink written 3.6 years ago by carlota.rubio0
1

Thanks for the example. It will be easier to explain.

According to genomic coding postion start and end you are right this is divisible by 3. BUT ONLY THE TOTAL ! Indeed some codons are split over two exons.

Your total is 445.

All length = 241 + 537 + 146 + 126 + 285 = 1335

And 1335 / 3 = 445.

I hope you understand it now. :)

ADD REPLYlink modified 10 weeks ago • written 3.6 years ago by Juke-342.0k
I see, I understand it now. Thanks!!!
ADD REPLYlink written 3.6 years ago by carlota.rubio0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 982 users visited in the last hour