I cannot figure out how to pull out the coding sequence from KnownGeneMrna.
I know that sequences in KnownGeneMrna contain UTRs, so what I am doing is taking CDS_start – tx_start from KnownGene to find the start of the CDS from the beginning of the KnownGeneMrna sequence. The columns in KnownGene are:
The problem is that some transcripts are shorter than the offset! For instance uc010nyq.2. Why is this and what am I doing wrong? I have found other related posts but none that address this point.