Hi everyone, I'm working on a term project involving read alignment tools, and I had a question regarding how these programs detect and report PCR duplicates.
As I understand it, a proportion of PCR duplicates will be false positives. One read from the pair may have a sequence identical to other reads, but if the other half of the pair aligns at a different region of the genome, it's not a true PCR duplicate, as it wouldn't originate from the same DNA fragment. And programs like FastQC only consider one read at a time, without looking at the paired end data.
But the SAM output from read alignment tools also contains a flag for PCR duplicates. When flagging a PCR duplicate, do read alignment tools look only at individual reads, or do they take into consideration the position of the other pair when the reads come from a paired-end library?
If anyone could give more insight into this I would appreciate it!