Question: Library Duplicates Vs. Optical Duplicates (Picard Markduplicates)
6
gravatar for toni
7.6 years ago by
toni2.1k
Lyon
toni2.1k wrote:

Hi all,

I am analyzing some Illumina paired-end sequencing experiment. I would like to track the duplicates in my lanes and be able to distinguish between PCR duplicates and optical duplicates.

To this purpose, I use Picard MarkDuplicates. This function has an OPTICAL_DUPLICATE_PIXEL_DISTANCE parameter ... nice ... but as the function simply set a flag to true in the sorted BAM file, there is no way in the end to distinguish between the two. (Am I right ?)

So, basically I am wondering if this option is really useful ? It is explained that MarkDuplicates starts to find the 5' coordinates and mapping orientations of each read pair, thus to look at the coordinates of the cluster on the flowcell seems unnecessary (?), as the pair will be tagged as a duplicate anyway.

Do you use in-house script or a particular API for such a goal ?

Cheers Tony

EDIT : I am aware that Picard creates a metrics file to report some values. But in some lanes generated with a PCR-free protocol, I expected a proportion of my duplicates to be optical duplicates. Nevertheless, in Picard metrics file, I always have %optical_dup=0. So I am wondering if some of you had some issues with this measure as well.

picard markduplicates bam • 9.7k views
ADD COMMENTlink modified 5.3 years ago by Biostar ♦♦ 20 • written 7.6 years ago by toni2.1k
3
gravatar for toni
7.5 years ago by
toni2.1k
Lyon
toni2.1k wrote:

I omitted to have a close look to the read names in my files. A read name has the following format :

 @identifier:lane:tile:x:y

Picard, by default, only match numbers and letters in the 'identifier' part. So if you have underscores (and it's quite usual to have some actually), Picard will not be able to get the coords back and then no optical duplicates will pop up...

Use the READ_NAME_REGEX option of MarkDuplicates to customize the read name matching.

ADD COMMENTlink written 7.5 years ago by toni2.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 818 users visited in the last hour