Question: GTF file: CDS feature is exon 1, but has frame = 1 or 2
0
gravatar for Marvin
3.1 years ago by
Marvin150
Australia
Marvin150 wrote:

Hello, I'm looking at a record from a GTF file:

18  protein_coding  CDS 2554668 2554691 .   -   2    gene_id "ENSG00000101574"; transcript_id "ENST00000576251"; exon_number "1"; gene_name "METTL4"; gene_biotype "protein_coding"; transcript_name "METTL4-010"; protein_id "ENSP00000460774";

If this is exon_number 1, how can it have a frame of 2 (I expect 0) ?

According to the documentation this means that the third base of this sequence is the first base of a codon. So what about the first two bases of this sequence then? Since this is exon 1? Where is the missing base? Do you know what I mean?

exon cds gtf • 1.3k views
ADD COMMENTlink modified 8 months ago by Biostar ♦♦ 20 • written 3.1 years ago by Marvin150
2
gravatar for Devon Ryan
3.1 years ago by
Devon Ryan95k
Freiburg, Germany
Devon Ryan95k wrote:

For genes on the - strand, you don't want exon 1, but the last exon. You'll note it has frame 0.

ADD COMMENTlink written 3.1 years ago by Devon Ryan95k

Just in this moment it clicked and I have understood what you meant 2 weeks ago :D

I do not know how to explain it to others but I highly recommend this: download the .gtf file from the ENSEMBL ftp server. check out the following transcript:

awk '$0 ~ /ENST00000576251/ && $3 == "CDS" {print $0}' Homo_sapiens.GRCh37.68.gtf | less

You will notice it has 4 exons. Pick exon_number "1" and enter its coordinates into UCSC genome browser hg19 like this:

chr18:2554667-2554692

Notice how I extended the interval at both sides by 1 nucleotide. Now in UCSC you will find the according transcript among others. You will see that the "intron arrows" of this exon point to the LEFT (extending the interval by 1 base makes this visible). That means (as Devon said) that the gene is on the minus strand. And now you can clearly see how it is correct that the left-most position in this CDS does NOT have frame 0. The last exon (exon 4) has frame 0.

I got it now, thanks for your reply Devon :)

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by Marvin150

Or you could just look at column 7 of the GTF. -/+

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by Emily_Ensembl21k

I think you misunderstood the purpose of my post: The idea was not to go to UCSC in order to see on which strand the gene is. Instead the idea was to walk through an example that makes you _understand_ (and see with your own eyes) why exon 1 doesn't necessarily have frame 0.

ADD REPLYlink written 3.0 years ago by Marvin150
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1988 users visited in the last hour