Inconsistent GTF from UCSC browser vs genePredToGtf
1
0
Entering edit mode
7.9 years ago

There seem to be some inconsistencies between GTF records from UCSC table browser and from files generated with genePredToGtf (from UCSC utilities).

For example, on the table browser I selected the vegaGene table to search for transcript OTTHUMT00000097860, and I get

chr1    hg19_vegaGene   start_codon 865692  865694  0.000000    +   .   gene_id "OTTHUMT00000097860"; transcript_id "OTTHUMT00000097860"; 
chr1    hg19_vegaGene   CDS 865692  865716  0.000000    +   2   gene_id "OTTHUMT00000097860"; transcript_id "OTTHUMT00000097860"; 
chr1    hg19_vegaGene   exon    865692  865716  0.000000    +   .   gene_id "OTTHUMT00000097860"; transcript_id "OTTHUMT00000097860"; 
chr1    hg19_vegaGene   CDS 866419  866469  0.000000    +   1   gene_id "OTTHUMT00000097860"; transcript_id "OTTHUMT00000097860"; 
...

If instead I use genePredToGtf the start_codon record seems to be missing:

curl -o - -O http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/vegaGene.txt.gz | gunzip -c \
| cut -f 2- \
| genePredToGtf file stdin stdout \
| grep 'OTTHUMT00000097860' \
| grep 'codon'

This return the stop codon but not the start:

chr1    stdin   stop_codon  879531  879533  .   +   0   gene_id "SAMD11"; transcript_id "OTTHUMT00000097860"; exon_number "12"; exon_id "OTTHUMT00000097860.12"; gene_name "SAMD11";

Am I missing something?

GTF genePredToGtf UCSC • 3.0k views
ADD COMMENT
1
Entering edit mode
7.9 years ago
Denise CS ★ 5.2k

Transcript OTTHUMT00000097860 is incomplete at the 5' end, CDS start not found (check 'remarks' in this page). In the GTF file from Ensembl that information is available as "cds_start_NF".

ADD COMMENT
0
Entering edit mode

Thanks a lot for digging this information out, it makes sense. This seems to suggest that the output of genePredToGtf is more accurate than the web browser's (?)

ADD REPLY
0
Entering edit mode

Glad to help and learn too :) Not sure about accurate versus non accurate as I'm not familiar with the UCSC utilities or table browser. But I'd expect what I see in the Ensembl browser is what I get from the FTP, via BioMart, REST API, Perl APIs. The underlying database is the same, the mode of access is different.

ADD REPLY

Login before adding your answer.

Traffic: 2467 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6