Question: Definition of PCR duplicates based on alignment coordinates
3.2 years ago
wrote:

Dear all,

I have to identify PCR duplicates by myself and would like to understand how tools like Picard's MarkDuplicates and Samtools' rmdup define them.

Do they require that the beginning and end alignment coordinates are the same? I was thinking that since the quality of reads usually degrades during the last sequencing cycles, it would be better to define read duplicates as those sharing the start coordinates, I mean, not requiring them to share also the end coordinates.

Is this how Picard/Samtools define them?


What I've found out so far... I do not have an exact answer but it seems available software mark duplicates by comparing only 5' coordinates (including clipping if present). If paired reads are at hand, the 5' coordinates of the first and second mates have to be identical (with respect to another pair of reads) to consider the pair a duplicate.

