Question: bbmerge merged read smaller that original reads
0
gravatar for snishtala03
7 months ago by
snishtala0330
snishtala0330 wrote:

Hello,

I have some paired end (2x150 bps) RNA-Seq reads from MiSeq for a viral genome. I need to merge the reads for a downstream analysis.(Also, since I noticed that when I merge my reads, there are a lot of reads which have a high overlap rate, merging them makes sense) -

bbmerge.sh in1=R1.fastq in2=R2.fastq out=merged.fastq outu1=R1_unmerged.fastq outu2=R2_unmerged.fastq

Here is the terminal output of bbmerge I get -

Pairs:                  3328768
Joined:                 2925342         87.881%
Ambiguous:              370409          11.128%
No Solution:            33017           0.992%
Too Short:              0               0.000%
Avg Insert:             176.0
Standard Deviation:     44.0
Mode:                   147

Insert range:           35 - 293
90th percentile:        243
75th percentile:        204
50th percentile:        167
25th percentile:        142
10th percentile:        126

Now, I use bwa to align to my reference genome allowing secondary alignments and there are a lot of cases where a read does align to multiple regions on the genome. When I was going over the alignments, I found some strange behaviour of the merged reads Where I see -

  1. Merged read is smaller than the original read, for example:
    @M02091:32:000000000-C28N4:1:1106:22793:14654 1:N:0:7
    GTCTTTGGGTATACATTTGAACCCTAATAAAACCAAACGTTGGGGCTACTCCCTTAACTTCATGGGATATGTAATTGGAAGTTGGGGTACTTTACCACAGGAACATATTGTAATGAAACTCAAGCAATGTTTTCGGAAACTGCCTGTAAAT
    +
    DCEEEFFFEBFFGGGGGGGGGGHGHHHHHHHHHGHHHHGHHHHGGGGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHGGGGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHGGGGGHHHHHHHHHHHH 
    
    @M02091:32:000000000-C28N4:1:1106:22793:14654 2:N:0:7
    AAAGAATTGTGGGTCTTTTGGGCTTTGCTGCCCCTTTTACACAATGTGGCTATCCTGCTTTGACAGACTTTCCAATCAATAGGTCTATTTACAGGCAGTTTCCGAAAACATTGCTTGAGTTTCATTACAATATGTTCCTGTGGTAAAGTAC
    +
    CCDDDFFFFFFCGGGGGGGGGGHHHHHHHHHHHGHHHHHHHHHGHHGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHGHGHHHHHGGGGGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHF
    
    Merged read is -
    AAAGAATTGTGGGTCTTTTGGGCTTTGCTGCCCCTTTTACACAATGTGGCTATCCTGCTTTGA
    
    I tried using alternative merging softwares like vsearch and flash as well to compare my results and interestingly, using both flash and vsearch, I see this this read pair to be merged correctly (see below) but a similar case comes up with a different example -
    AAAGAATTGTGGGTCTTTTGGGCTTTGCTGCCCCTTTTACACAATGTGGCTATCCTGCTTTGACAGACTTTCCAATCAATAGGTCTATTTACAGGCAGTTTCCGAAAACATTGCTTGAGTTTCATTACAATATGTTCCTGTGGTAAAGTACCCCAACTTTCAATTACATAACCCATGAAGTTAAGGGAGTAGCCCCAACGTTTGGTTTTATTAGGGTTCAAATGTATACCCAAAGAC
    
    My command line for v search is -
    vsearch --fastq_mergepairs R1.fastq --reverse R2.fastq --eetabbedout error_stats --fastqout merged.fastq --fastqout_notmerged_fwd fw_unmerged.fastq --fastqout_notmerged_rev rev_unmerged.fastq
    
    My command line for flash is -
    flash R1.fastq R2.fastq -M 151
    
  2. This question is not about merging but more about the nature of my reads. As you can see from the example above, my R1 undergoes reverse complement which shows that for the R1.fastq and R2.fastq files have a mix of forward and reverse reads. Is there a way I can solve this and put all R1 reads in one file and R2 reads in the other file. I am trying to remove duplicates after I align my reads, and this is causing problem as it prevents reads from being deduplicated correctly.
ADD COMMENTlink modified 7 months ago by RamRS24k • written 7 months ago by snishtala0330
1
gravatar for h.mon
7 months ago by
h.mon28k
Brazil
h.mon28k wrote:

Which version of BBTools are you using? I just tested the sequence you provided as example and it bbmerge.sh (BBTools 38.43) merged the pairs correctly:

@M02091:32:000000000-C28N4:1:1106:22793:14654 1:N:0:7
GTCTTTGGGTATACATTTGAACCCTAATAAAACCAAACGTTGGGGCTACTCCCTTAACTTCATGGGATATGTAATTGGAAGTTGGGGTACTTTACCACAGGAACATATTGTAATGAAACTCAAGCAATGTTTTCGGAAACTGCCTGTAAATAGACCTATTGATTGGAAAGTCTGTCAAAGCAGGATAGCCACATTGTGTAAAAGGGGCAGCAAAGCCCAAAAGACCCACAATTCTTT
+
DCEEEFFFEBFFGGGGGGGGGGHGHHHHHHHHHGHHHHGHHHHGGGGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHGJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHGHHGHHHHHHHHHGHHHHHHHHHHHGGGGGGGGGGCFFFFFFDDDCC
  

In addition, the merged read you showed as example has a very strange substitution at position 160. At this position the reads do not overlap, so the consensus should correspond to read 1. However, there is a T at the consensus read, while it is a C at the original read 1. Is the example you showed from vsearch or flash? Does any of them perform some form of error correction?

ADD COMMENTlink written 7 months ago by h.mon28k

Thank you for your response. I was using an older version, I updated my version to the current one - 38.44 and I get correctly merged reads!

I used vsearch for that example, I think they do account for errors, not sure though.

ADD REPLYlink written 7 months ago by snishtala0330

Which older version of BBTools? It would be interesting to know the version affected.

ADD REPLYlink written 7 months ago by h.mon28k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 968 users visited in the last hour