I have Annovar output that gives me a position for a variant from the start of a transcript CDS. The rna sequences in "knownGeneMrna" include the UTRs. I am trying to find the start of the CDS (exclude the first UTR) by using "knownGene.txt" but I don't know what the columns are. I thought it was the case that column 4 was the start of the transcript and column 6 was the end of the UTR but this does not make sense for certain transcripts (such as uc001abv.1) where the UTR seems to be longer than the transcript itself?
I wonder if somebody could help me with a description of or a link to a description of "knownGene"?