Question

Bowtie2 keeps exiting with "Error reading _plen[] array:"

0

Entering edit mode

9 months ago

biobaker • 0

Hello, and thanks in advance for your help!

I have an assembled metagenome, and I am trying to map reads back to the assembled genes to estimate their relative abundance. I have tried doing this a few different ways, but always get the same error:

Error reading _plen[] array: 252280792, 252285144
Error: Encountered internal Bowtie 2 exception (#1)

The only other place I've seen this error mentioned is here, but the suggestions in that post (remake the index or allocate more memory) has not seemed to work.

Here's what I tell bowtie to do:

bowtie2 -x cbdb.reference --interleaved test/input.corr.fastq -S cb.sam -p 6 --verbose

The index "cbdb.reference" is a dereplicated database of gene sequences from across multiple samples. I have tried using a database without dereplication and still get the same error, so I don't suspect that's the issue.

I have also tried using raw reads, filtered reads, and filtered corrected reads, and still the same error. I've tried entering them as --interleaved or as -U unpaired, but as they are interleaved paired reads, I'm not sure that this is the cause of the issue.

Here is what the first 20 lines of my filtered corrected reads file looks:

@A00178:530:HV7LKDSX7:4:2140:14561:18818 1:N:0:CTTCTGAGAT+CTTCTGAGGT
GACCTCTCGGATATCGTTTCCACGTACTCGAATCTTGTAGGATAGCGACCGCTTGAGCTGACCCGAAGCGACCCCGTAGTTCTTATTCTTTCCAATCTTGCGACCACCGAGATGACGCTTTGCGCTCTTGACGATATCATCGGAGAACGC
+
FFFF,FFFFFFFF:FF,FFFFFF:FFFF:FFF:FF,FFFFFFFFFFFFF:FFFFFFFFFFFFFFFF:FFFFFFFFFF,:::F:F:FF,FFF,F:F,FF,FFFFFFFFFFF:F::FFFFF,FFF:FFF,FFFFFFF:::,,FF,FFF,F,F
@A00178:530:HV7LKDSX7:4:2140:14561:18818 2:N:0:CTTCTGAGAT+CTTCTGAGGT
CCATTGGCTACGTTAGAACGGCTTATTTCGCTTATGACGGAACTCACGGAAATCGAAGTACGGGAGTTGTCTTCCTACGTTCTCAACTCGCAATCGTTTCCGTTTGGGGCAATCGTTCCAAGTACGACGACGGTAAGGACATATGAAGAA
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFF:F,FFFFFFFFF:F:FFFFFFFFFFFFFFF:FFFFFFFF
@A00178:530:HV7LKDSX7:4:2372:7274:36417 1:N:0:CTTCTGAGAT+CTTCTGAGGT
CCTCTCGGATATCGTTTCCACGTACTCGAATCTTGTAGGATAGCGACCGCTTGAGCTGACCCGAAGCGACCCCGTAGTTCTTATTCTTTCCAATCTTGCGACCACCGAGATGACGCTTTGCGCTCTTGACGATATCATCGGAGAACGCTA
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:F:FFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFF:FF,FFFFFFF:FFFFFF:FFFF
@A00178:530:HV7LKDSX7:4:2372:7274:36417 2:N:0:CTTCTGAGAT+CTTCTGAGGT
GTGGGTACAAAAAACACCAACAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
:,,,,,,,,F,,:,F,,,,,,,,FFFFF,:,::F::FFF::FFFFFFFFFF,,:,,,FF:F::,,:,:,:,,,:F:F:::,F:F,FFFFF:F,FFFF,FFF:FFF,,F:,FF:::FF,F,F:F:FFFF,FF,FFFFFF:F::F,,:F,FF
@A00178:530:HV7LKDSX7:4:1546:28492:26287 1:N:0:CTTCTGAGAT+CTTCTGAGGT
CTCTCGGATATCGTTTCCACGTACTCGAATCTTGTAGGATAGCGACCGCTTGAGCTGACCCGAAGCGACCCCGTAGTTCTTATTCTTTCCAATCTTGCGACCACCGAGATGACGCTTTGCGCTCTTGACGATATCATCGGAGAACGCTAG
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

For reference, I've pasted the entire output from bowtie. I am unfortunately stuck using version 2.4.1 for now, as I'm running this on my institution's HPC cluster where I don't have permissions to update the version. What am I doing wrong??

(INFO): After arg handling:
(INFO):   Binary args:
[ -x cbdb.reference -S cb2806.sam -p 8 --verbose --interleaved forReadMapping/280in_ermophilus_2_FD/QC_and_Genome_Assembly/JGI_assembly_of__Metagenome_Minimal_Draft_-_2806_ASSEMBLY_DATE20231117/input.corr.fastq ]
(INFO): Cannot find a small index but a large one seems to be present.
(INFO): Switching to using the large index (cbdb.reference.1.bt2l).
(INFO): "/data/apps/linux-centos8-cascadelake/gcc-9.3.0/bowtie2-2.4.1-olhlhcsbsqtec57sarerebmkju7azx4q/bin/bowtie2-align-l" --wrapper basic-0 -x "cbdb.reference" -S "cb2806.sam" -p 8 --verbose --interleaved "test/input.corr.fastq"
Applying preset: 'sensitive' using preset menu 'V0'
Final policy string: 'SEED=0;SEEDLEN=22;DPS=15;ROUNDS=2;IVAL=S,1,1.15'
Entered driver(): 10:46:13
Creating PatternSource: 10:46:13
Opening hit output file: 10:46:13
About to initialize fw Ebwt: 10:46:13
  About to open input files: 10:46:13
Opening "cbdb.reference.1.bt2l"
Opening "cbdb.reference.2.bt2l"
  Finished opening input files: 10:46:13
  Reading header: 10:46:13
Headers:
    len: 13116164520
    bwtLen: 13116164521
    sz: 3279041130
    bwtSz: 3279041131
    lineRate: 7
    offRate: 4
    offMask: 0xfffffffffffffff0
    ftabChars: 10
    eftabLen: 20
    eftabSz: 160
    ftabLen: 1048577
    ftabSz: 8388616
    offsLen: 819760283
    offsSz: 6558082264
    lineSz: 128
    sideSz: 128
    sideBwtSz: 96
    sideBwtLen: 384
    numSides: 34156679
    numLines: 34156679
    ebwtTotLen: 4372054912
    ebwtTotSz: 4372054912
    color: 0
    reverse: 0
Reading plen (31535643): 10:46:13
Error reading _plen[] array: 252280792, 252285144
Error: Encountered internal Bowtie 2 exception (#1)
Command: /data/apps/linux-centos8-cascadelake/gcc-9.3.0/bowtie2-2.4.1-olhlhcsbsqtec57sarerebmkju7azx4q/bin/bowtie2-align-l --wrapper basic-0 -x cbdb.reference -S cb2806.sam -p 8 --verbose --interleaved test/input.corr.fastq 
(ERR): bowtie2-align exited with value 1

bowtie2 • 1.2k views

ADD COMMENT • link updated 9 months ago by GenoMax 154k • written 9 months ago by biobaker • 0

0

Entering edit mode

-1 test/reads.fastq.gz -2 test/reads.fastq.gz

This may be the problem. You appear to be using the same file name (unless you are obfuscating the names) as input for both reads.

Warning: Same mate file "test/reads.fastq.gz" appears as argument to both -1 and -2

This appears to be confirmed by the warning in log above.

Are your paired-end data files in sync i.e. they have the same number of reads and in the same order.

ADD REPLY • link 9 months ago by GenoMax 154k

0

Entering edit mode

As I mentioned in my post, I have also tried entering the reads as a single file as --interleaved or -U. The fastq.gz file unzips to be a single fastq, but it should be paired-end reads. They were generated by JGI (if that helps for context). I will edit my post above to show what the first 20 lines of the fastq looks like.

ADD REPLY • link 9 months ago by biobaker • 0

0

Entering edit mode

Update: I have confirmed that my reads are indeed interleaved in the single file, so I should only be using --interleaved mode. I'll update the post! Unfortunately, this still hasn't solved the issue.

ADD REPLY • link 9 months ago by biobaker • 0

0

Entering edit mode

do they have the same number of reads and in the same order.

You did not address this comment.

You can de-interleave the reads using reformat.sh from BBMap suite like this and see if the reads actually are the same number in both files.

reformat.sh -Xmx4g in=input.fq.gz out1=R1.fq.gz out2=R2.fq.gz

If they are then try providing then as separate files to see if that helps.

My hunch is that the reads in this file are out of sync and that may be part of the issue. If they are out-of-sync you will need to repair.sh them to remove singletons.

ADD REPLY • link 9 months ago by GenoMax 154k

0

Entering edit mode

Right, as I said in my follow-up comment, I realized that the files I was using before were identical, and interleaved. Hence why I changed my code to enter the file once as --interleaved instead of as mate pairs.

I think the issue may actually be in the index. I built the index by concatenating all of my gene sequences from all samples, but I am mapping reads from each individual sample to this large index. As a test, I just now made a new index for gene sequences from only the sample in question, and now bowtie works. I am not sure why it wouldn't work when the original index was larger.

ADD REPLY • link 9 months ago by biobaker • 0

0

Entering edit mode

I am not sure why it wouldn't work when the original index was larger.

Does bowtie2-inspect return information about the original index i.e. no errors.

If you are willing you could try using bbmap.sh as an optional aligner.

ADD REPLY • link 9 months ago by GenoMax 154k

0

Entering edit mode

Thank you! Bowtie2-inspect returns the following

bowtie2-inspect-l: word_io.h:125: T readU(FILE*, bool) [with T = unsigned int; FILE = _IO_FILE]: Assertion `false' failed.

I also just realized that the larger index did not have any .rev.1.bt2 files. My smaller index did, and it worked. Could this possibly be the source of the issue?

If this continues to not work, I will definitely give bbmap a try!

ADD REPLY • link 9 months ago by biobaker • 0

0

Entering edit mode

Looks like your original index was not complete/corrupt. So perhaps that explains the initial issue. There are no duplicates in the original file in terms of name/sequences correct?

ADD REPLY • link 9 months ago by GenoMax 154k