Question: GTF file: CDS feature is exon 1, but has frame = 1 or 2
3.1 years ago by
Marvin150 wrote:

Hello, I'm looking at a record from a GTF file:

18  protein_coding  CDS 2554668 2554691 .   -   2    gene_id "ENSG00000101574"; transcript_id "ENST00000576251"; exon_number "1"; gene_name "METTL4"; gene_biotype "protein_coding"; transcript_name "METTL4-010"; protein_id "ENSP00000460774";

If this is exon_number 1, how can it have a frame of 2 (I expect 0) ?

According to the documentation this means that the third base of this sequence is the first base of a codon. So what about the first two bases of this sequence then? Since this is exon 1? Where is the missing base? Do you know what I mean?

3.1 years ago by
Devon Ryan95k
Freiburg, Germany
Devon Ryan95k wrote:

For genes on the - strand, you don't want exon 1, but the last exon. You'll note it has frame 0.

Just in this moment it clicked and I have understood what you meant 2 weeks ago :D

I do not know how to explain it to others but I highly recommend this: download the .gtf file from the ENSEMBL ftp server. check out the following transcript:

awk '$0 ~ /ENST00000576251/ && $3 == "CDS" {print $0}' Homo_sapiens.GRCh37.68.gtf | less

You will notice it has 4 exons. Pick exon_number "1" and enter its coordinates into UCSC genome browser hg19 like this:


Notice how I extended the interval at both sides by 1 nucleotide. Now in UCSC you will find the according transcript among others. You will see that the "intron arrows" of this exon point to the LEFT (extending the interval by 1 base makes this visible). That means (as Devon said) that the gene is on the minus strand. And now you can clearly see how it is correct that the left-most position in this CDS does NOT have frame 0. The last exon (exon 4) has frame 0.

I got it now, thanks for your reply Devon :)

Or you could just look at column 7 of the GTF. -/+

I think you misunderstood the purpose of my post: The idea was not to go to UCSC in order to see on which strand the gene is. Instead the idea was to walk through an example that makes you _understand_ (and see with your own eyes) why exon 1 doesn't necessarily have frame 0.

