Why 3'-UTR length is 1 or 0?
1
0
Entering edit mode
4.1 years ago
yancychy ▴ 10

Hi, I want to get the 5'-UTR and 3'-UTR coodinates from the annotation file. I downloaded the "All GENCODE VM24" bed file from UCSC genome browser. The format of the bed file is following:

bin                 name   chrom  strand   txStart      txEnd            
1023 ENSMUST00000000090.7    chr9      +     57521278   57532426    

cdsStart       cdsEnd         exonCount 
57521327       57531782         5

exonStarts                                        exonEnds
57521278,57528955,57530240,57531668,57532281,     57521415,57529072,57530362,57531792,57532426,       

score  name2    cdsStartStat   cdsEndStat  exonFrames 
0      Cox5a         cmpl      cmpl       0,1,1,0,-1,

For 5'UTR, its length is cdsStart - txStart = 57521327 - 57521278 = 49, from (57521278, 57521327 ]

For 3'UTR, its length is txEnd - cdsEnd = 57532426 - 57531782 = 644, from (57531782, 57532426 ]

However, the length of all exons (the trs) is only 645.
https://useast.ensembl.org/Mus_musculus/Transcript/Sequence_cDNA?db=core;g=ENSMUSG00000000088;r=9:57521279-57532426;t=ENSMUST00000000090

Is there anything wrong?

In addition, I find the 5'UTR and 3'UTR length are 1 for some transcript. Is it reasonable? In the link A: Easy Way To Get 3' Utr Lengths Of A List Of Genes, the 3'UTR length of OR4F5 is 0 and 1.

5utr ensembl_transcript_id  
A    ENSMUST00000054837

3utr ensembl_transcript_id   
G    ENSMUST00000073261
genome sequence gene • 1.0k views
ADD COMMENT
1
Entering edit mode
4.1 years ago

The end of the coding sequence is not in the last exon, but the penultimate exon -

the cdsEnd is 57,531,782 and the start of the last exon is 57,532,281. Thus your calculation for the length of the UTR is including the last intron in the sample.

enter image description here

I would guess in terms of the 1bp or 0bp UTRs, that they are simply unannotated.

EDIT: That first one is a non-coding RNA. It shouldn't have any UTR (or rather its all UTR, so there is no 5' or 3' UTR). The second one is protein coding. Its a Histone gene. Histones have odd termination machinary (they don't use polyA signals), but as far as I was aware, they do have UTRs normally.

ADD COMMENT
0
Entering edit mode

Thanks very much. The real 3'UTR length is 57531792 - 57531782 + (57532426 - 57532281) = 155. Thanks for the explaination of non-coding RNA and the UTR length. It's very clear.

ADD REPLY

Login before adding your answer.

Traffic: 1832 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6