Question: bbmerge merged read smaller that original reads
gravatar for snishtala03
21 months ago by
snishtala0340 wrote:


I have some paired end (2x150 bps) RNA-Seq reads from MiSeq for a viral genome. I need to merge the reads for a downstream analysis.(Also, since I noticed that when I merge my reads, there are a lot of reads which have a high overlap rate, merging them makes sense) - in1=R1.fastq in2=R2.fastq out=merged.fastq outu1=R1_unmerged.fastq outu2=R2_unmerged.fastq

Here is the terminal output of bbmerge I get -

Pairs:                  3328768
Joined:                 2925342         87.881%
Ambiguous:              370409          11.128%
No Solution:            33017           0.992%
Too Short:              0               0.000%
Avg Insert:             176.0
Standard Deviation:     44.0
Mode:                   147

Insert range:           35 - 293
90th percentile:        243
75th percentile:        204
50th percentile:        167
25th percentile:        142
10th percentile:        126

Now, I use bwa to align to my reference genome allowing secondary alignments and there are a lot of cases where a read does align to multiple regions on the genome. When I was going over the alignments, I found some strange behaviour of the merged reads Where I see -

  1. Merged read is smaller than the original read, for example:
    @M02091:32:000000000-C28N4:1:1106:22793:14654 1:N:0:7
    @M02091:32:000000000-C28N4:1:1106:22793:14654 2:N:0:7
    Merged read is -
    I tried using alternative merging softwares like vsearch and flash as well to compare my results and interestingly, using both flash and vsearch, I see this this read pair to be merged correctly (see below) but a similar case comes up with a different example -
    My command line for v search is -
    vsearch --fastq_mergepairs R1.fastq --reverse R2.fastq --eetabbedout error_stats --fastqout merged.fastq --fastqout_notmerged_fwd fw_unmerged.fastq --fastqout_notmerged_rev rev_unmerged.fastq
    My command line for flash is -
    flash R1.fastq R2.fastq -M 151
  2. This question is not about merging but more about the nature of my reads. As you can see from the example above, my R1 undergoes reverse complement which shows that for the R1.fastq and R2.fastq files have a mix of forward and reverse reads. Is there a way I can solve this and put all R1 reads in one file and R2 reads in the other file. I am trying to remove duplicates after I align my reads, and this is causing problem as it prevents reads from being deduplicated correctly.
ADD COMMENTlink modified 21 months ago by _r_am32k • written 21 months ago by snishtala0340
gravatar for h.mon
21 months ago by
h.mon32k wrote:

Which version of BBTools are you using? I just tested the sequence you provided as example and it (BBTools 38.43) merged the pairs correctly:

@M02091:32:000000000-C28N4:1:1106:22793:14654 1:N:0:7

In addition, the merged read you showed as example has a very strange substitution at position 160. At this position the reads do not overlap, so the consensus should correspond to read 1. However, there is a T at the consensus read, while it is a C at the original read 1. Is the example you showed from vsearch or flash? Does any of them perform some form of error correction?

ADD COMMENTlink written 21 months ago by h.mon32k

Thank you for your response. I was using an older version, I updated my version to the current one - 38.44 and I get correctly merged reads!

I used vsearch for that example, I think they do account for errors, not sure though.

ADD REPLYlink written 21 months ago by snishtala0340

Which older version of BBTools? It would be interesting to know the version affected.

ADD REPLYlink written 21 months ago by h.mon32k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1221 users visited in the last hour