I am analyzing some Illumina paired-end sequencing experiment. I would like to track the duplicates in my lanes and be able to distinguish between PCR duplicates and optical duplicates.
To this purpose, I use Picard MarkDuplicates. This function has an OPTICAL_DUPLICATE_PIXEL_DISTANCE parameter ... nice ... but as the function simply set a flag to true in the sorted BAM file, there is no way in the end to distinguish between the two. (Am I right ?)
So, basically I am wondering if this option is really useful ? It is explained that MarkDuplicates starts to find the 5' coordinates and mapping orientations of each read pair, thus to look at the coordinates of the cluster on the flowcell seems unnecessary (?), as the pair will be tagged as a duplicate anyway.
Do you use in-house script or a particular API for such a goal ?
EDIT : I am aware that Picard creates a metrics file to report some values. But in some lanes generated with a PCR-free protocol, I expected a proportion of my duplicates to be optical duplicates. Nevertheless, in Picard metrics file, I always have %optical_dup=0. So I am wondering if some of you had some issues with this measure as well.