Question

Convert Ucsc Files To Gtf For Tophat

0

Entering edit mode

11.8 years ago

Richard ▴ 590

I have UCSC data that was downloaded for hg19 about 6 months ago, and now I need GTF files for the same data to use with Tophat. Should I be translating the data dump files I have for knownGenes and refGenes to GTF, or is it safe to redownload as GTF and assume the files are based on the same version of annotation.

I tried making my files into GTF, but didn't have success....I have UCSC downloaded annotations that look like this:

> ... 585     NR_026818       chr1    -       34610   36081   36081  
> 36081   3       34610,35276,35720,      35174,35481,36081,      0     
> FAM138A unk     unk     -1,-1,-1, ...

I tried using awk just to make a file that looks like this (for example)...

> 1       hg19_refGene    exon    11874   12227   .       +       0     
> gene_id "uc001aaa.3"; 1       hg19_refGene    exon    12613   12721  
> .       +       1       gene_id "uc001aaa.3"; 1       hg19_refGene   
> exon    13221   14409   .       +       2       gene_id "uc001aaa.3";
> 1       hg19_refGene    exon    11874   12227   .       +       0     
> gene_id "uc010nxq.1"; 1       hg19_refGene    exon    12595   12721  
> .       +       1       gene_id "uc010nxq.1";

but Tophat says that " Warning: TopHat did not find any junctions in GTF file" so obviously I am not meeting whatever the requirements are for tophat to use these annotations correctly. However, I don't know what I am missing.

Alternatively, if you can tell me if the files for RefSeq and knownGenes are not updated once they are released for a genome build I could go back and re-download the files I have in the format I need. Until I know that however, I am wary of using whatever files are available now since I'm not confident that the data has not changed.

tophat ucsc gtf • 3.9k views

ADD COMMENT • link updated 11.8 years ago by Ashwin ▴ 110 • written 11.8 years ago by Richard ▴ 590

score 0 · Answer 1 · 2012-06-27

0

Entering edit mode

11.8 years ago

Ashwin ▴ 110

Try using genePredToGtf from UCSC

ADD COMMENT • link 11.8 years ago by Ashwin ▴ 110