Question: Bowtie / Tophat failing on unmapped reads
0
gravatar for spitzer
16 months ago by
spitzer0
spitzer0 wrote:

I'm very new at working with TopHat and Bowtie. I'm trying to align a set of paired-end RNA-seq reads onto a reference genome. According to tophat.log, left_kept_reads and right_kept_reads are being successfully mapped to the transcriptome. However, when the TopHat pipeline resumes with the unmapped reads, and Bowtie2 tries to map left_kept_reads.m2g_um to the reference genome, it logs an message: "[bam_header_read] EOF marker is absent. The input is probably truncated." A minute or so later, the process throws an error and terminates.

When I examine the files left_kept_reads.m2g_um.bam and right_kept_reads.mg2_um.bam, I find that both of them are missing the 28-byte block at the end that samtools recognizes as EOF for a .bam file. I assume that's what is causing the program to crash, but I don't know why the EOF block isn't being added, or what I can do about it.


The tophat commands I'm running are:

module load samtools/1.8

module load boost/1.66.0_gcc5+

module load bowtie2/2.3.2

module load tophat/2.0.13

tophat -r 200 -G /project/bf528/project_2/reference/annot/mm9.gtf --segment-length=20 --segment-mismatches=1 --no-novel-juncs -o P0_2_tophat -p 16 /project/bf528/project_2/reference/mm9 P0_1_1.fastq P0_1_2.fastq

The tophat.log file shows:

  • [2019-06-05 13:12:56] Beginning TopHat run (v2.0.13)
  • [2019-06-05 13:12:56] Checking for Bowtie Bowtie version: 2.3.2.0
  • [2019-06-05 13:12:56] Checking for Bowtie index files (genome)..
  • [2019-06-05 13:12:56] Checking for reference FASTA file
  • [2019-06-05 13:12:56] Generating SAM header for /project/bf528/project_2/reference/mm9
  • [2019-06-05 13:12:58] Reading known junctions from GTF file
  • [2019-06-05 13:13:05] Preparing reads left reads: min. length=40, max. length=40, 21561496 kept reads (16066 discarded) right reads: min. length=40, max. length=40, 21347948 kept reads (229614 discarded)
  • [2019-06-05 13:18:56] Building transcriptome data files P0_2_tophat/tmp/mm9
  • [2019-06-05 13:19:13] Building Bowtie index from mm9.fa
  • [2019-06-05 13:31:32] Mapping left_kept_reads to transcriptome mm9 with Bowtie2
  • [2019-06-05 13:40:33] Mapping right_kept_reads to transcriptome mm9 with Bowtie2
  • [2019-06-05 13:49:22] Resuming TopHat pipeline with unmapped reads
  • [2019-06-05 13:49:22] Mapping left_kept_reads.m2g_um to genome mm9 with Bowtie2
  • [bam_header_read] EOF marker is absent. The input is probably truncated.
  • [2019-06-05 13:49:36] Retrieving sequences for splices
  • [2019-06-05 13:50:42] Indexing splices [FAILED]
  • Error: Splice sequence indexing failed with err =1

Thanks in advance!

bowtie rna-seq alignment tophat • 330 views
ADD COMMENTlink written 16 months ago by spitzer0
1

Unless there is a dire need for tophat, use a current aligner such as STAR where possible.

ADD REPLYlink written 16 months ago by genomax91k

Before anything, I must state I agree with genomax and think you should consider a more recent RNAseq aligner.

Maybe the problem is with the SAMtools you are loading. From the release notes:

TopHat 2.0.13 release 10/2/2014

Version 2.0.13 is a maintenance release with the following changes:

  • removed SAMtools as an external dependency in order to avoid incompatibility issues with recent and future changes of SAMtools and its code library (an older, stable SAMtools version is now packaged with TopHat)

I would believe TopHat2 would preferentially use the bundled SAMtools, but you may try without module load samtools/1.8 and see if this helps.

Another thing to consider is an incompatibility between the particular TopHat version (from 2014) and Bowtie2 version (from 2017). You could try updating to the latest versions of both tools.

ADD REPLYlink written 16 months ago by h.mon31k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 892 users visited in the last hour