Hello, and thanks in advance for your help!
I have an assembled metagenome, and I am trying to map reads back to the assembled genes to estimate their relative abundance. I have tried doing this a few different ways, but always get the same error:
Error reading _plen[] array: 252280792, 252285144
Error: Encountered internal Bowtie 2 exception (#1)
The only other place I've seen this error mentioned is here, but the suggestions in that post (remake the index or allocate more memory) has not seemed to work.
Here's what I tell bowtie to do:
bowtie2 -x cbdb.reference --interleaved test/input.corr.fastq -S cb.sam -p 6 --verbose
The index "cbdb.reference" is a dereplicated database of gene sequences from across multiple samples. I have tried using a database without dereplication and still get the same error, so I don't suspect that's the issue.
I have also tried using raw reads, filtered reads, and filtered corrected reads, and still the same error. I've tried entering them as --interleaved or as -U unpaired, but as they are interleaved paired reads, I'm not sure that this is the cause of the issue.
Here is what the first 20 lines of my filtered corrected reads file looks:
@A00178:530:HV7LKDSX7:4:2140:14561:18818 1:N:0:CTTCTGAGAT+CTTCTGAGGT
GACCTCTCGGATATCGTTTCCACGTACTCGAATCTTGTAGGATAGCGACCGCTTGAGCTGACCCGAAGCGACCCCGTAGTTCTTATTCTTTCCAATCTTGCGACCACCGAGATGACGCTTTGCGCTCTTGACGATATCATCGGAGAACGC
+
FFFF,FFFFFFFF:FF,FFFFFF:FFFF:FFF:FF,FFFFFFFFFFFFF:FFFFFFFFFFFFFFFF:FFFFFFFFFF,:::F:F:FF,FFF,F:F,FF,FFFFFFFFFFF:F::FFFFF,FFF:FFF,FFFFFFF:::,,FF,FFF,F,F
@A00178:530:HV7LKDSX7:4:2140:14561:18818 2:N:0:CTTCTGAGAT+CTTCTGAGGT
CCATTGGCTACGTTAGAACGGCTTATTTCGCTTATGACGGAACTCACGGAAATCGAAGTACGGGAGTTGTCTTCCTACGTTCTCAACTCGCAATCGTTTCCGTTTGGGGCAATCGTTCCAAGTACGACGACGGTAAGGACATATGAAGAA
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFF:F,FFFFFFFFF:F:FFFFFFFFFFFFFFF:FFFFFFFF
@A00178:530:HV7LKDSX7:4:2372:7274:36417 1:N:0:CTTCTGAGAT+CTTCTGAGGT
CCTCTCGGATATCGTTTCCACGTACTCGAATCTTGTAGGATAGCGACCGCTTGAGCTGACCCGAAGCGACCCCGTAGTTCTTATTCTTTCCAATCTTGCGACCACCGAGATGACGCTTTGCGCTCTTGACGATATCATCGGAGAACGCTA
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:F:FFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFF:FF,FFFFFFF:FFFFFF:FFFF
@A00178:530:HV7LKDSX7:4:2372:7274:36417 2:N:0:CTTCTGAGAT+CTTCTGAGGT
GTGGGTACAAAAAACACCAACAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
:,,,,,,,,F,,:,F,,,,,,,,FFFFF,:,::F::FFF::FFFFFFFFFF,,:,,,FF:F::,,:,:,:,,,:F:F:::,F:F,FFFFF:F,FFFF,FFF:FFF,,F:,FF:::FF,F,F:F:FFFF,FF,FFFFFF:F::F,,:F,FF
@A00178:530:HV7LKDSX7:4:1546:28492:26287 1:N:0:CTTCTGAGAT+CTTCTGAGGT
CTCTCGGATATCGTTTCCACGTACTCGAATCTTGTAGGATAGCGACCGCTTGAGCTGACCCGAAGCGACCCCGTAGTTCTTATTCTTTCCAATCTTGCGACCACCGAGATGACGCTTTGCGCTCTTGACGATATCATCGGAGAACGCTAG
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
For reference, I've pasted the entire output from bowtie. I am unfortunately stuck using version 2.4.1 for now, as I'm running this on my institution's HPC cluster where I don't have permissions to update the version. What am I doing wrong??
(INFO): After arg handling:
(INFO): Binary args:
[ -x cbdb.reference -S cb2806.sam -p 8 --verbose --interleaved forReadMapping/280in_ermophilus_2_FD/QC_and_Genome_Assembly/JGI_assembly_of__Metagenome_Minimal_Draft_-_2806_ASSEMBLY_DATE20231117/input.corr.fastq ]
(INFO): Cannot find a small index but a large one seems to be present.
(INFO): Switching to using the large index (cbdb.reference.1.bt2l).
(INFO): "/data/apps/linux-centos8-cascadelake/gcc-9.3.0/bowtie2-2.4.1-olhlhcsbsqtec57sarerebmkju7azx4q/bin/bowtie2-align-l" --wrapper basic-0 -x "cbdb.reference" -S "cb2806.sam" -p 8 --verbose --interleaved "test/input.corr.fastq"
Applying preset: 'sensitive' using preset menu 'V0'
Final policy string: 'SEED=0;SEEDLEN=22;DPS=15;ROUNDS=2;IVAL=S,1,1.15'
Entered driver(): 10:46:13
Creating PatternSource: 10:46:13
Opening hit output file: 10:46:13
About to initialize fw Ebwt: 10:46:13
About to open input files: 10:46:13
Opening "cbdb.reference.1.bt2l"
Opening "cbdb.reference.2.bt2l"
Finished opening input files: 10:46:13
Reading header: 10:46:13
Headers:
len: 13116164520
bwtLen: 13116164521
sz: 3279041130
bwtSz: 3279041131
lineRate: 7
offRate: 4
offMask: 0xfffffffffffffff0
ftabChars: 10
eftabLen: 20
eftabSz: 160
ftabLen: 1048577
ftabSz: 8388616
offsLen: 819760283
offsSz: 6558082264
lineSz: 128
sideSz: 128
sideBwtSz: 96
sideBwtLen: 384
numSides: 34156679
numLines: 34156679
ebwtTotLen: 4372054912
ebwtTotSz: 4372054912
color: 0
reverse: 0
Reading plen (31535643): 10:46:13
Error reading _plen[] array: 252280792, 252285144
Error: Encountered internal Bowtie 2 exception (#1)
Command: /data/apps/linux-centos8-cascadelake/gcc-9.3.0/bowtie2-2.4.1-olhlhcsbsqtec57sarerebmkju7azx4q/bin/bowtie2-align-l --wrapper basic-0 -x cbdb.reference -S cb2806.sam -p 8 --verbose --interleaved test/input.corr.fastq
(ERR): bowtie2-align exited with value 1
This may be the problem. You appear to be using the same file name (unless you are obfuscating the names) as input for both reads.
This appears to be confirmed by the warning in log above.
Are your paired-end data files in sync i.e. they have the same number of reads and in the same order.
As I mentioned in my post, I have also tried entering the reads as a single file as --interleaved or -U. The fastq.gz file unzips to be a single fastq, but it should be paired-end reads. They were generated by JGI (if that helps for context). I will edit my post above to show what the first 20 lines of the fastq looks like.
Update: I have confirmed that my reads are indeed interleaved in the single file, so I should only be using --interleaved mode. I'll update the post! Unfortunately, this still hasn't solved the issue.
You did not address this comment.
You can de-interleave the reads using
reformat.sh
from BBMap suite like this and see if the reads actually are the same number in both files.If they are then try providing then as separate files to see if that helps.
My hunch is that the reads in this file are out of sync and that may be part of the issue. If they are out-of-sync you will need to
repair.sh
them to remove singletons.Right, as I said in my follow-up comment, I realized that the files I was using before were identical, and interleaved. Hence why I changed my code to enter the file once as --interleaved instead of as mate pairs.
I think the issue may actually be in the index. I built the index by concatenating all of my gene sequences from all samples, but I am mapping reads from each individual sample to this large index. As a test, I just now made a new index for gene sequences from only the sample in question, and now bowtie works. I am not sure why it wouldn't work when the original index was larger.
Does
bowtie2-inspect
return information about the original index i.e. no errors.If you are willing you could try using
bbmap.sh
as an optional aligner.Thank you! Bowtie2-inspect returns the following
I also just realized that the larger index did not have any .rev.1.bt2 files. My smaller index did, and it worked. Could this possibly be the source of the issue?
If this continues to not work, I will definitely give bbmap a try!
Looks like your original index was not complete/corrupt. So perhaps that explains the initial issue. There are no duplicates in the original file in terms of name/sequences correct?