Question

Why does Txdb transcript length not always match to transcript end-start position?

1

Entering edit mode

2.7 years ago

Sora Yoon ▴ 20

I have just found an example that biomart's transcript_length is not identical with transcript_end - transcript_start.

ensembl_gene_id mgi_symbol chromosome_name strand start_position end_position   gene_biotype transcript_start transcript_end strand.1 transcript_length
128537 ENSMUSG00000037860       Aim2               1      1      173178445    173293606 protein_coding        173178445      173293606        1              2839
128538 ENSMUSG00000037860       Aim2               1      1      173178445    173293606 protein_coding        173246762      173255347        1               383
128539 ENSMUSG00000037860       Aim2               1      1      173178445    173293606 protein_coding        173248164      173287285        1               744

Does anyone know why such discrepancy happen??

Thanks

biomart transcript Txdb length transcription • 1.1k views

ADD COMMENT • link updated 2.7 years ago by i.sudbery 19k • written 2.7 years ago by Sora Yoon ▴ 20

score 3 · Answer 1 · 2021-08-06

3

Entering edit mode

2.7 years ago

Pierre Lindenbaum 161k

Does anyone know why such discrepancy happen??

Life

ADD COMMENT • link 2.7 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Thanks. Then, it means that a great portion is intron in Aim2 gene.

ADD REPLY • link 2.7 years ago by Sora Yoon ▴ 20

0

Entering edit mode

Yes, this is almost always the case. Exons make up 2% of the human genome, but transcripts, including introns, cover 40% of the human genome (ref)

ADD REPLY • link 2.7 years ago by i.sudbery 19k

score 2 · Answer 2 · 2021-08-06

2

Entering edit mode

2.7 years ago

i.sudbery 19k

transcript_end - transcript_start encompases the full length of the genomic region associated with the transcript, including intronic sequence, where as "transcript" length is the length of the mature RNA transcript (i.e. just the length of the exonic sequences, after removal of the introns by splicing).