Question: Removing duplicate reads from Illumina before hybrid de-novo assembly/ before using for correcting PacBio reads or Pac-Bio only assemblies
gravatar for VS
2.8 years ago by
VS710 wrote:

Hi All,

I was wondering how important is to get rid of exact duplicate Illumina reads before --

  1. Before using it for correcting PacBio reads (planning to use ProovRead)
  2. Before using it to polish a Pac-Bio only assembly using Pilon (Assembly was done using uncorrected PacBio reads - miniasm)
  3. Before using the reads to do a hybrid de-novo-assembly using PBcR

Some of my Illumina libraries have significant amounts of reads duplicated >10 times. What are your recommendations to handle these duplicate reads considering the scenarios mentioned above?

Many thanks in advance!

ADD COMMENTlink modified 2.7 years ago by Brian Bushnell16k • written 2.8 years ago by VS710
gravatar for Brian Bushnell
2.7 years ago by
Walnut Creek, USA
Brian Bushnell16k wrote:

It's not a good idea to remove duplicate reads unless your libraries are amplified. If they are amplified, and you have reads appearing 10+ times, I highly recommend you change to an unamplified protocol, because you are wasting sequence. And by duplicates, I mean that both read 1 and read 2 of pairs are duplicates... otherwise the pairs are not, in fact, duplicates.

But - if you have a situation in which you are using an amplified library, and duplicate pairs occur, I recommend eliminating all duplicates and replacing them with a single copy of their consensus, in any situation other than quantification (e.g. RNA-seq).

ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by Brian Bushnell16k

Thanks, this seems reasonable.

ADD REPLYlink written 2.7 years ago by VS710
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1448 users visited in the last hour