Question: Why 3'-UTR length is 1 or 0?
gravatar for yancychy
9 months ago by
yancychy10 wrote:

Hi, I want to get the 5'-UTR and 3'-UTR coodinates from the annotation file. I downloaded the "All GENCODE VM24" bed file from UCSC genome browser. The format of the bed file is following:

bin                 name   chrom  strand   txStart      txEnd            
1023 ENSMUST00000000090.7    chr9      +     57521278   57532426    

cdsStart       cdsEnd         exonCount 
57521327       57531782         5

exonStarts                                        exonEnds
57521278,57528955,57530240,57531668,57532281,     57521415,57529072,57530362,57531792,57532426,       

score  name2    cdsStartStat   cdsEndStat  exonFrames 
0      Cox5a         cmpl      cmpl       0,1,1,0,-1,

For 5'UTR, its length is cdsStart - txStart = 57521327 - 57521278 = 49, from (57521278, 57521327 ]

For 3'UTR, its length is txEnd - cdsEnd = 57532426 - 57531782 = 644, from (57531782, 57532426 ]

However, the length of all exons (the trs) is only 645.;g=ENSMUSG00000000088;r=9:57521279-57532426;t=ENSMUST00000000090

Is there anything wrong?

In addition, I find the 5'UTR and 3'UTR length are 1 for some transcript. Is it reasonable? In the link A: Easy Way To Get 3' Utr Lengths Of A List Of Genes, the 3'UTR length of OR4F5 is 0 and 1.

5utr ensembl_transcript_id  
A    ENSMUST00000054837

3utr ensembl_transcript_id   
G    ENSMUST00000073261
sequence gene genome • 180 views
ADD COMMENTlink modified 9 months ago by i.sudbery9.8k • written 9 months ago by yancychy10
gravatar for i.sudbery
9 months ago by
Sheffield, UK
i.sudbery9.8k wrote:

The end of the coding sequence is not in the last exon, but the penultimate exon -

the cdsEnd is 57,531,782 and the start of the last exon is 57,532,281. Thus your calculation for the length of the UTR is including the last intron in the sample.

enter image description here

I would guess in terms of the 1bp or 0bp UTRs, that they are simply unannotated.

EDIT: That first one is a non-coding RNA. It shouldn't have any UTR (or rather its all UTR, so there is no 5' or 3' UTR). The second one is protein coding. Its a Histone gene. Histones have odd termination machinary (they don't use polyA signals), but as far as I was aware, they do have UTRs normally.

ADD COMMENTlink modified 9 months ago • written 9 months ago by i.sudbery9.8k

Thanks very much. The real 3'UTR length is 57531792 - 57531782 + (57532426 - 57532281) = 155. Thanks for the explaination of non-coding RNA and the UTR length. It's very clear.

ADD REPLYlink written 9 months ago by yancychy10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1142 users visited in the last hour