I've recently noticed weird entries in hg37.63 from Ensembl. As an example, here is the first exon of trancript ENST00000310701:
1       protein_coding  exon    148025761       148025848       .       -       .        gene_id "ENSG00000122497"; transcript_id "ENST00000310701"; exon_number "1"; gene_name "NBPF14"; transcript_name "NBPF14-001";
1       protein_coding  CDS     148025761       148025848       .       -       2        gene_id "ENSG00000122497"; transcript_id "ENST00000310701"; exon_number "1"; gene_name "NBPF14"; transcript_name "NBPF14-001"; protein_id "ENSP00000309907";
This seems to be a protein coding transcript. Exon and CDS start and end at the same position, which means there is no UTR.
Here is the weird part: If you query Ensembl for variants at the start position and one base before, you get
Uploaded Variation  Location    Allele  Gene    Feature Feature type    Consequence Position in cDNA    Position in CDS Position in protein Amino acid change   Codon change    Co-located Variation    Extra
1_148025849_A   1:148025849 A   ENSG00000122497 ENST00000310701 Transcript  UPSTREAM    -   -   -   -   -   -   -
1_148025848_A   1:148025848 A   ENSG00000122497 ENST00000310701 Transcript  SYNONYMOUS_CODING   1   2   1   X   nAa/nTa -   -
So, the start base (148025848) is the SECOND base of the first codon. If you take a detailed look at the GTF definition, you'll notice a '2' on the 'frame' column.
The question is: Considering that the transcript has no UTR, is there a valid reason for the first base of the first exon to be the second base of the CDS?
I guess an alternative question is: Am I incorrect in the interpretation of this data or this looks like a bug?
According to my interpretation of this GTF 2.2 specification (http://mblab.wustl.edu/GTF22.html), the "frame" calculation on these transcripts seems to be incorrect.
It looks like there are around 5000 transcripts in hg37.63 that may have a similar problem.