Entering edit mode
3.7 years ago
ManuelDB
▴
110
As far as I know, I was expecting around 20 thousand start_codon in the refGene.gtf file.
According to my genetics knowledge:
- We have appr. 20 thousand protein-coding genes
- there is only one start codon per protein_coding
- Only protein-coding genes have start codon
If this is correct, why
ncbiRefSeq = ".GTF_files/hg38.refGene.gtf"
ncbiRefSeq = read_gtf(ncbiRefSeq)
ncbiRefSeq[ ("start_codon" == ncbiRefSeq['feature'])]
LIFE.
Ahhh! I know about splicing but I though start codon refers to the gene not the transcript. I have seen that there are even more 5UTR why?
the total length of gene is not directly provided in this gtf file?
that's a wrong assertion.
a UTR can span more than one exon....