Question: Inconsistent GTF from UCSC browser vs genePredToGtf
0
gravatar for dariober
3.0 years ago by
dariober10k
WCIP | Glasgow | UK
dariober10k wrote:

There seem to be some inconsistencies between GTF records from UCSC table browser and from files generated with genePredToGtf (from UCSC utilities).

For example, on the table browser I selected the vegaGene table to search for transcript OTTHUMT00000097860, and I get

chr1    hg19_vegaGene   start_codon 865692  865694  0.000000    +   .   gene_id "OTTHUMT00000097860"; transcript_id "OTTHUMT00000097860"; 
chr1    hg19_vegaGene   CDS 865692  865716  0.000000    +   2   gene_id "OTTHUMT00000097860"; transcript_id "OTTHUMT00000097860"; 
chr1    hg19_vegaGene   exon    865692  865716  0.000000    +   .   gene_id "OTTHUMT00000097860"; transcript_id "OTTHUMT00000097860"; 
chr1    hg19_vegaGene   CDS 866419  866469  0.000000    +   1   gene_id "OTTHUMT00000097860"; transcript_id "OTTHUMT00000097860"; 
...

If instead I use genePredToGtf the start_codon record seems to be missing:

curl -o - -O http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/vegaGene.txt.gz | gunzip -c \
| cut -f 2- \
| genePredToGtf file stdin stdout \
| grep 'OTTHUMT00000097860' \
| grep 'codon'

This return the stop codon but not the start:

chr1    stdin   stop_codon  879531  879533  .   +   0   gene_id "SAMD11"; transcript_id "OTTHUMT00000097860"; exon_number "12"; exon_id "OTTHUMT00000097860.12"; gene_name "SAMD11";

Am I missing something?

ucsc genepredtogtf gtf • 1.5k views
ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by dariober10k
1
gravatar for Denise - Open Targets
3.0 years ago by
UK, Hinxton, EMBL-EBI
Denise - Open Targets4.9k wrote:

Transcript OTTHUMT00000097860 is incomplete at the 5' end, CDS start not found (check 'remarks' in this page). In the GTF file from Ensembl that information is available as "cds_start_NF".

ADD COMMENTlink written 3.0 years ago by Denise - Open Targets4.9k

Thanks a lot for digging this information out, it makes sense. This seems to suggest that the output of genePredToGtf is more accurate than the web browser's (?)

ADD REPLYlink written 3.0 years ago by dariober10k

Glad to help and learn too :) Not sure about accurate versus non accurate as I'm not familiar with the UCSC utilities or table browser. But I'd expect what I see in the Ensembl browser is what I get from the FTP, via BioMart, REST API, Perl APIs. The underlying database is the same, the mode of access is different.

ADD REPLYlink written 3.0 years ago by Denise - Open Targets4.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1134 users visited in the last hour