Tophat2 with GTF annotation file made from blast results
1
0
Entering edit mode
5.0 years ago

Hi!

The goal of my subproject is to find transcripts upregulated upon treatment in the larvae of Spodoptera littoralis. On the current step I have an assembly, and the tblastx results (assembly against the huge database from NCBI). In order to continue with annotation I transformed blast results into gtf file (wrote a python script that also does cutoff and then write results into gtf format). The gtf file looks like this:

Slitt_C1    tblastx exon    4225    5697    667 +   .   gene_id "gi|827554818|ref|XM_004929801.2|"; transcript_id "gi|827554818|ref|XM_004929801.2|";


After that aligned the reads to the assembly by bowtie2, and tried to run tophat2.

~/bin/tophat2 -G bowtie_result/Slitt.gtf -o tophat_with_annotation/ -p 16 bowtie_result/Slitt ../02_trim/A_R1_P.fq ../02_trim/A_R2_P.fq


And I got the error:

[2016-07-28 11:05:14] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2016-07-28 11:05:14] Checking for Bowtie
Bowtie version:        2.2.8.0
[2016-07-28 11:05:15] Checking for Bowtie index files (genome)..
[2016-07-28 11:05:15] Checking for reference FASTA file
[2016-07-28 11:05:15] Generating SAM header for bowtie_result/Slitt
[2016-07-28 11:05:15] Reading known junctions from GTF file
[2016-07-28 11:08:38] Building transcriptome data files transcriptome_index/Slitt
[2016-07-28 11:08:42] Building Bowtie index from Slitt.fa
[FAILED]
Error: Couldn't build bowtie index with err = 1


After looking into the log file, the last thing tophat was trying to do was to run bowtie2-build on the Slitt.fa from the temp folder.

I already checked the names in the assembly and annotation, it is all the same, so mistake is coming from something else. I would appreciate any tips how to get blast results into the expression level file (of course I can run a script for assigning the regions of scaffolds to the specific annotations, but it will take a lot of time).

Thank you!

RNA-Seq Tophat Bowtie blast • 1.7k views
0
Entering edit mode

Even though it won't fix this problem I am thinking to use STAR.

0
Entering edit mode

Would be a good idea, although it might satisfy you to solve the problem ;-)

0
Entering edit mode

Is the Slitt.fa file in the same directory as your bowtie index (bowtie_result)? If not try putting a copy in there.

0
Entering edit mode

yep, I guess I tried all the standard solutions there are in the internet.

0
Entering edit mode

You may want align against the genome (rather than the transcriptome) so as to avoid forcing the aligner to mis-align reads, especially in this case where you don't have a well defined transcriptome.

0
Entering edit mode

In the perfect world I would. However, there is no genome for this or even close related species.

1
Entering edit mode

TopHat is deprecated to some extent so perhaps using STAR or HISAT2 (if you want to stay with the same family) may be better option as you have already indicated.

I would recommend that you try BBMap. If you do use it then remember to add flag sam=1.3 since the default SAM flags are v. 1.4 which are not understood by featureCounts/HTSeq-count.

0
Entering edit mode
5.0 years ago
EagleEye 7.0k
• Have you first tried to build Bowtie index on your genome file before running Tophat ? We might get some some clue if you try.

• Also check Chromosome names, whether it matches between GTF and your genome FASTA file.

0
Entering edit mode

Bowtie2-build and bowtie worked perfectly before tophat2 step. It only did not work during tophat process. The names of the chromosomes are the same everywhere, I checked it several times.