Question: Tophat2 with GTF annotation file made from blast results
0
gravatar for Anton Shekhov
2.6 years ago by
Germany/Jena/MPI ICE
Anton Shekhov60 wrote:

Hi!

The goal of my subproject is to find transcripts upregulated upon treatment in the larvae of Spodoptera littoralis. On the current step I have an assembly, and the tblastx results (assembly against the huge database from NCBI). In order to continue with annotation I transformed blast results into gtf file (wrote a python script that also does cutoff and then write results into gtf format). The gtf file looks like this:

Slitt_C1    tblastx exon    4225    5697    667 +   .   gene_id "gi|827554818|ref|XM_004929801.2|"; transcript_id "gi|827554818|ref|XM_004929801.2|";

After that aligned the reads to the assembly by bowtie2, and tried to run tophat2.

~/bin/tophat2 -G bowtie_result/Slitt.gtf -o tophat_with_annotation/ -p 16 bowtie_result/Slitt ../02_trim/A_R1_P.fq ../02_trim/A_R2_P.fq

And I got the error:

[2016-07-28 11:05:14] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2016-07-28 11:05:14] Checking for Bowtie
                  Bowtie version:        2.2.8.0
[2016-07-28 11:05:15] Checking for Bowtie index files (genome)..
[2016-07-28 11:05:15] Checking for reference FASTA file
[2016-07-28 11:05:15] Generating SAM header for bowtie_result/Slitt
[2016-07-28 11:05:15] Reading known junctions from GTF file
[2016-07-28 11:05:20] Preparing reads
         left reads: min. length=59, max. length=99, 4805446 kept reads (1341 discarded)
        right reads: min. length=59, max. length=99, 4795101 kept reads (11686 discarded)
[2016-07-28 11:08:38] Building transcriptome data files transcriptome_index/Slitt
[2016-07-28 11:08:42] Building Bowtie index from Slitt.fa
        [FAILED]
Error: Couldn't build bowtie index with err = 1

After looking into the log file, the last thing tophat was trying to do was to run bowtie2-build on the Slitt.fa from the temp folder.

I already checked the names in the assembly and annotation, it is all the same, so mistake is coming from something else. I would appreciate any tips how to get blast results into the expression level file (of course I can run a script for assigning the regions of scaffolds to the specific annotations, but it will take a lot of time).

Thank you!

rna-seq blast bowtie tophat • 1.1k views
ADD COMMENTlink modified 2.6 years ago by EagleEye6.2k • written 2.6 years ago by Anton Shekhov60

Even though it won't fix this problem I am thinking to use STAR.

ADD REPLYlink written 2.6 years ago by Anton Shekhov60

Would be a good idea, although it might satisfy you to solve the problem ;-)

ADD REPLYlink written 2.6 years ago by WouterDeCoster36k

Is the Slitt.fa file in the same directory as your bowtie index (bowtie_result)? If not try putting a copy in there.

ADD REPLYlink written 2.6 years ago by genomax62k

yep, I guess I tried all the standard solutions there are in the internet.

ADD REPLYlink written 2.6 years ago by Anton Shekhov60

You may want align against the genome (rather than the transcriptome) so as to avoid forcing the aligner to mis-align reads, especially in this case where you don't have a well defined transcriptome.

ADD REPLYlink written 2.6 years ago by genomax62k

In the perfect world I would. However, there is no genome for this or even close related species.

ADD REPLYlink written 2.6 years ago by Anton Shekhov60
1

TopHat is deprecated to some extent so perhaps using STAR or HISAT2 (if you want to stay with the same family) may be better option as you have already indicated.

I would recommend that you try BBMap. If you do use it then remember to add flag sam=1.3 since the default SAM flags are v. 1.4 which are not understood by featureCounts/HTSeq-count.

ADD REPLYlink written 2.6 years ago by genomax62k
0
gravatar for EagleEye
2.6 years ago by
EagleEye6.2k
Sweden
EagleEye6.2k wrote:
  • Have you first tried to build Bowtie index on your genome file before running Tophat ? We might get some some clue if you try.

  • Also check Chromosome names, whether it matches between GTF and your genome FASTA file.

ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by EagleEye6.2k

Bowtie2-build and bowtie worked perfectly before tophat2 step. It only did not work during tophat process. The names of the chromosomes are the same everywhere, I checked it several times.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by Anton Shekhov60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1770 users visited in the last hour