Question: proximal duplicate parameters - HiSeq X
gravatar for Richard
2.5 years ago by
Richard560 wrote:

Hi all,

We are doing some QC on our HiSeqX genome data. Here we see an increased rate of duplicates for the same libraries sequenced on other platforms. This is a known phenomenon as mentioned in this post on the GATK website:

However, we're trying to work something out on our end and I find myself wondering if these Picard MarkDuplicates parameters are appropriate for data from the HiSeqX:

READ_NAME_REGEX=[a-zA-Z0-9_]+:[0-9]+:([0-9]+):([0-9]+):([0-9]+).* OPTICAL_DUPLICATE_PIXEL_DISTANCE=2500

example read name: HISEQX2_43:2:1115:17188:49513

I'm going to do some testing on the OPTICAL_DUPLICATE_PIXEL_DISTANCE value, but I'd like to hear if anyone finds themselves using a different value than that above (2500)

proximal duplicates • 887 views
ADD COMMENTlink modified 2.5 years ago by genomax70k • written 2.5 years ago by Richard560

I used some test HiSeq X data and tried a number of pixel distances. It does seem that the distance of about 2500 is about the saturation point for identifying duplicates as optical

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Richard560
gravatar for genomax
2.5 years ago by
United States
genomax70k wrote:

I will suggest that you give from BBMap a try to identify duplicates (optical, PCR and other kinds). There is a detailed thread available on SeqAnswers. With you do not need (or need to) align(ed) data. You can use spantiles=f dupedist=2500 parameters (which work for HiSeq 4K and should be appropriate for HiSeq X)

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by genomax70k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 868 users visited in the last hour