Am I using Tophat2 correctly? Segment-based junction search error keeps coming up.
Entering edit mode
6.2 years ago
aswartz85 ▴ 20

Although I'm very inexperienced with bioinformatics, what I"m trying to do is very straightforward. I want to align my Miseq mRNAseq reads to the mouse transcriptome.

Thus far, I've downloaded the Ensembl GCRm38dna.fa genome file and indexed it with bowtie2-build

I've also downlead the Ensembl GCRm38.85.GTF file for transcriptome annotation

To run tophat, I'm using the following command (default parameters):

tophat2 -G MusGRCm3885.gtf MusGRCm38dna 560RF.fastq

However, I'm getting the error:

I'm not quite sure what's going on. The computer I"m using has ~4 GB ram. Should I change the min length to <50, considering my mRNA snippets are ~30 bases?

[2016-09-16 13:05:51] Checking for Bowtie
          Bowtie version:
[2016-09-16 13:05:52] Checking for Bowtie index files (genome)..
[2016-09-16 13:05:52] Checking for reference FASTA file
[2016-09-16 13:05:52] Generating SAM header for MusGRCm38dna
[2016-09-16 13:07:06] Reading known junctions from GTF file
[2016-09-16 13:07:33] Preparing reads
     left reads: min. length=50, max. length=50, 20782969 kept reads (99 discarded)
[2016-09-16 13:12:10] Building transcriptome data files ./tophat_out/tmp/MusGRCm3885
[2016-09-16 13:13:41] Building Bowtie index from MusGRCm3885.fa
[2016-09-16 13:30:36] Mapping left_kept_reads to transcriptome MusGRCm3885 with Bowtie2 
[2016-09-16 13:47:27] Resuming TopHat pipeline with unmapped reads
[2016-09-16 13:47:27] Mapping left_kept_reads.m2g_um to genome MusGRCm38dna with Bowtie2 
[2016-09-16 14:33:35] Mapping left_kept_reads.m2g_um_seg1 to genome MusGRCm38dna with Bowtie2 (1/2)
[2016-09-16 15:30:31] Mapping left_kept_reads.m2g_um_seg2 to genome MusGRCm38dna with Bowtie2 (2/2)
[2016-09-16 15:55:52] Searching for junctions via segment mapping
    Coverage-search algorithm is turned on, making this step very slow
    Please try running TopHat again with the option (--no-coverage-search) if this step takes too much time or memory.
Error: segment-based junction search failed with err =-9
    found 0 potential small insertions

RNA-Seq alignment • 2.4k views
Entering edit mode

You'll have to look through the tophat log to find the last command it's running. If you then run that yourself you'll get the actual underlying error message, which will hopefully be more informative.

Having said that, STAR is faster and tends to produce better results.

Entering edit mode

running the coverage search can be very intensive in terms of memory and cpu usage. You might have better luck running it on a cluster using the parallel option.

Entering edit mode

I agree with Devon that you might want to shift to STAR, which will eventually take Tophat2 place. But... what do you mean with "considering my mRNA snippets are ~30 bases"?


Login before adding your answer.

Traffic: 1200 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6