Question: Should I merge forward and reverse reads before mapping
gravatar for qiyunzhu
5.8 years ago by
United States
qiyunzhu130 wrote:

Dear Community,

I have been working on estimating the relative abundancy of selected bacterial strains in a metagenomic dataset, derived from whole genome sequencing using Illumina TruSeq kit + HiSeq sequencer. My thought is to do initial quality control using Trimmomatic, followed by mapping to reference bacterial genomes using Bowtie2. I read that I can merge forward and reverse reads before mapping, using tools like PEAR.

Now I am wondering if this merging step is recommended or not? Will this make the subsequent mapping more accurate?

Also, since I am quantifying reads mapped to each bacterial genome. Once I merge the reads and map, I should treat merged and unmerged reads differently in calculation. For example, One hit from a merged read should be counted twice, as compared to two hits from both forward and reverse reads. Am I right?

Thanks in advance!

ADD COMMENTlink modified 5.8 years ago by epistatic180 • written 5.8 years ago by qiyunzhu130
gravatar for Brian Bushnell
5.8 years ago by
Walnut Creek, USA
Brian Bushnell17k wrote:

I did a comparison of the accuracy of various merging tools here.  PEAR did not perform very well.

But, generally, I don't recommend merging before mapping.  It's more relevant to assembly.  As you note, it would make coverage calculations a bit more tricky.  And it would possibly introduce some bias, without any particular benefit.


Merging before mapping is actually quite useful for detecting midsize (~50-400bp) insertions, as that ability is dependent on read length. Other than that scenario I still don't recommend it.

ADD COMMENTlink modified 3.4 years ago • written 5.8 years ago by Brian Bushnell17k

Thanks for your valuable information!

ADD REPLYlink written 5.8 years ago by qiyunzhu130

I have been thinking that, wouldn't longer reads be mapped to the reference more accurately?

ADD REPLYlink written 5.8 years ago by qiyunzhu130

Yes, but aligners also try to keep pairs together.  So if read 1 could map to 5 locations, and read 2 could map to 3 locations, but there is only one location where both could map nearby, that is the site that will be selected.  So there should not be much difference in sensitivity or specificity between paired reads and merged reads.

ADD REPLYlink written 5.8 years ago by Brian Bushnell17k

That sounds reasonable. Thanks for your explanation!

ADD REPLYlink written 5.8 years ago by qiyunzhu130

Hi Brian,

I have some confusion regarding your explanation, could you please clarify them to me?

I understand that when merging 2 paired reads, we only merge the overlap part of them, if they have innie-orientation, only the end (arrow head) parts of them are merged, the larger tail head will remain the same, and isn't that the tail path of both read are prone for mapping? So mapping would not be affected, isn't it? And if we keep pairing infomation by merging into longer read, will it increase accuracy in mapping?
I could present this idea by the illustration below:

R1 map to 5  ------------------------------>


                                               <----------------------------------- R2 map to 3 locations

Thank you in advance for your ideas and suggestion!


ADD REPLYlink modified 5.2 years ago • written 5.2 years ago by pbigbig210

A correctly merged read will map more accurately than either read1 or read2 alone, because it is longer.  But when mapped as a pair, the accuracy should be similar whether merged or unmerged.  Merging has the advantage of reducing the substitution error rate in the overlapping region, but it has the disadvantage of potentially introducing indels in false-positive merges.  That's very rare with BBMerge, though.

ADD REPLYlink written 5.2 years ago by Brian Bushnell17k

Thank you Brian !

ADD REPLYlink written 5.1 years ago by pbigbig210
gravatar for epistatic
5.8 years ago by
Cambridge, MA
epistatic180 wrote:

Would paired reads be better than merged reads for fusion/rearrangement detection?  I have long overlapping paired reads and using BWA-MEM for alignment.  I have been merging the overlap into a single long read and then aligning.

ADD COMMENTlink written 5.8 years ago by epistatic180

BWA-MEM is pretty good at reporting multiple chimeric local alignments from a single read, so it may not matter too much.  Though I find it simpler to treat the reads as pairs, because when merging you always end up with two classes of reads - merged and unmerged - which need to be treated differently.

ADD REPLYlink written 5.8 years ago by Brian Bushnell17k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1601 users visited in the last hour