I have an incomplete GTF file with lines such as:
chr1    hg38_ct_UserTrack_3545  exon    94353   94355   2109    +   .   gene_id "R2_66"; transcript_id "R2_66_1";
This describes an exon. All the lines in my incomplete GTF file describe an exon or CDS.
Question:
I want to fix my GTF file. For instance, I need to do something like:
chrI   hg38_ct_UserTrack_3545  gene      6790136 6808198 .       +       .       gene_id "R1_102";
and
chrI   hg38_ct_UserTrack_3545  transcript      6790136 6808198 .       +       .       transcript_id "R1_102";
I would like to add annotation at the transcript and gene level. What's the best way?
I think you can get gene region on genome by your gtf file. Ignore UTR region. You can try to get the start and end position of a gene. The start and end position should in the first and last exon's boundary, these exons should belong to the same gene.