Question: How To Determine Pcr Duplicate/Redundant Reads In Ngs Data?
gravatar for JacobS
7.2 years ago by
Cleveland, Ohio
JacobS930 wrote:


I am looking for various methods for determining PCR duplicates/redundant reads in NGS data, and so far have come across the "mark duplicates" method in Piccard, and the the rmdup method in SAMTools. Does anyone else know of other software packages that performs this function?


EDIT: I will aggregate any software found to be able to mark PCR duplicates in a list:

  • SAMTools
  • Piccard
duplicates qc pcr • 4.8k views
ADD COMMENTlink modified 7.2 years ago by Istvan Albert ♦♦ 85k • written 7.2 years ago by JacobS930

Just curious: why are you looking for other tools? Is there some feature or behavior you're looking for that Picard and Samtools does not currently provide?

ADD REPLYlink written 7.2 years ago by Dan D7.1k
gravatar for Istvan Albert
7.2 years ago by
Istvan Albert ♦♦ 85k
University Park, USA
Istvan Albert ♦♦ 85k wrote:

There are a few ways to go about it. There are tools that

  • look for exact matches via an associative array (hash, dictionary): for example the fastx_collapser in the fastx toolkit.
  • look for exact matches by sorting the sequences and removing consecutive exactly identical sequences, for that you could use a combinations of command line tools such as of sort and uniq
  • look for reads that align over the same region, for this work the data would need to be aligned against a reference genome: samtools rmdup works this way
  • cluster the reads and merge reads that are very similar to one another using a tool like uclust

Ideally the best way to remove duplicates is that performed after alignment but depending on the problem that may not be feasible.

For more details search this site for "remove duplicates" to find good posts on various tools and techniques.

ADD COMMENTlink modified 7.2 years ago • written 7.2 years ago by Istvan Albert ♦♦ 85k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2269 users visited in the last hour