Removing duplicate reads from Illumina before hybrid de-novo assembly/ before using for correcting PacBio reads or Pac-Bio only assemblies
2
1
Entering edit mode
7.8 years ago
VS ▴ 730

Hi All,

I was wondering how important is to get rid of exact duplicate Illumina reads before --

  1. Before using it for correcting PacBio reads (planning to use ProovRead)
  2. Before using it to polish a Pac-Bio only assembly using Pilon (Assembly was done using uncorrected PacBio reads - miniasm)
  3. Before using the reads to do a hybrid de-novo-assembly using PBcR

Some of my Illumina libraries have significant amounts of reads duplicated >10 times. What are your recommendations to handle these duplicate reads considering the scenarios mentioned above?

Many thanks in advance!

assembly ProovRead PBcR Pilon duplicate reads • 3.5k views
ADD COMMENT
2
Entering edit mode
7.8 years ago

It's not a good idea to remove duplicate reads unless your libraries are amplified. If they are amplified, and you have reads appearing 10+ times, I highly recommend you change to an unamplified protocol, because you are wasting sequence. And by duplicates, I mean that both read 1 and read 2 of pairs are duplicates... otherwise the pairs are not, in fact, duplicates.

But - if you have a situation in which you are using an amplified library, and duplicate pairs occur, I recommend eliminating all duplicates and replacing them with a single copy of their consensus, in any situation other than quantification (e.g. RNA-seq).

ADD COMMENT
0
Entering edit mode

Thanks, this seems reasonable.

ADD REPLY

Login before adding your answer.

Traffic: 1964 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6