GTF to match transcriptome data in TopHat
0
0
Entering edit mode
4.2 years ago
blur ▴ 280

Hi,

I am trying to do risobome profilings, and to that end I have been trying to align reads to the human transcriptome with no success. My aligner is TopHat - The transcriptomic hg19 data fasta file looks like this:

>uc001aaa.3
cttgccgtcagccttttctttgacctcttc

I need a GTF file for the run (at least I assume that I do?) but the GTF file that I have downloaded from UCSC looks like this:

chr1    hg19_knownGene  exon    11874   12227   0.000000    +   .   gene_id "uc001aaa.3"; transcript_id "uc001aaa.3";

Upon running I get an error msg -

2020-02-04 15:15:08] Building Bowtie index from ucsc_hg19.fa
    [FAILED]

Looking thorough the posts here I think that the problem is that my GTF does not match the transcriptome. I have tried to figure out if I can get a transcriptomic GTF from UCSC, and I couldn't find any data. Or am I doing this the wrong way and should have used a differently built reference? I have downloaded the genomic data that includes the protein coding genes. The names in this file look like this:

>hg19_knownGene_uc001aaa.3 range=chr1:11874-14409 5'pad=0 3'pad=0 strand=+ repeatMasking=none
cttgccgtcagccttttctttgacctcttctttctgttcatgtgtatttg

The end goal is to only look the transcriptome data, do RPKM and check expression.

Any help would be greatly appreciated,

TopHat transcriptome GTF • 696 views
ADD COMMENT

Login before adding your answer.

Traffic: 3231 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6