Why are there many start_codon for the same gene in hg38.refGene.gtf
0
0
Entering edit mode
3.7 years ago
ManuelDB ▴ 110

As far as I know, I was expecting around 20 thousand start_codon in the refGene.gtf file.

According to my genetics knowledge:

  1. We have appr. 20 thousand protein-coding genes
  2. there is only one start codon per protein_coding
  3. Only protein-coding genes have start codon

If this is correct, why

ncbiRefSeq = ".GTF_files/hg38.refGene.gtf"

ncbiRefSeq = read_gtf(ncbiRefSeq)

ncbiRefSeq[ ("start_codon" == ncbiRefSeq['feature'])]

enter image description here

gtf • 1.6k views
ADD COMMENT
2
Entering edit mode

LIFE.

ADD REPLY
0
Entering edit mode

Ahhh! I know about splicing but I though start codon refers to the gene not the transcript. I have seen that there are even more 5UTR why?

exon           836541
CDS            655401
5UTR           121178
transcript      84787
3UTR            67901
start_codon     64026
stop_codon      64026

the total length of gene is not directly provided in this gtf file?

ADD REPLY
1
Entering edit mode

I though start codon refers to the gene not the transcript.

that's a wrong assertion.

ADD REPLY
1
Entering edit mode

I have seen that there are even more 5UTR why?

a UTR can span more than one exon....

ADD REPLY

Login before adding your answer.

Traffic: 3004 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6