GTF to match transcriptome data in TopHat
14 months ago
blur ▴ 180


I am trying to do risobome profilings, and to that end I have been trying to align reads to the human transcriptome with no success. My aligner is TopHat - The transcriptomic hg19 data fasta file looks like this:


I need a GTF file for the run (at least I assume that I do?) but the GTF file that I have downloaded from UCSC looks like this:

chr1    hg19_knownGene  exon    11874   12227   0.000000    +   .   gene_id "uc001aaa.3"; transcript_id "uc001aaa.3";

Upon running I get an error msg -

2020-02-04 15:15:08] Building Bowtie index from ucsc_hg19.fa

Looking thorough the posts here I think that the problem is that my GTF does not match the transcriptome. I have tried to figure out if I can get a transcriptomic GTF from UCSC, and I couldn't find any data. Or am I doing this the wrong way and should have used a differently built reference? I have downloaded the genomic data that includes the protein coding genes. The names in this file look like this:

>hg19_knownGene_uc001aaa.3 range=chr1:11874-14409 5'pad=0 3'pad=0 strand=+ repeatMasking=none

The end goal is to only look the transcriptome data, do RPKM and check expression.

Any help would be greatly appreciated,

