Question: How to set a cutoff value when de-duplicating
gravatar for chxu02
5.5 years ago by
United States
chxu0210 wrote:

I'm doing BS-seq with some ChIP DNA. To get 500M reads from <1ng ChIP DNA, you can imagine the duplication level is HUGE. FastQC reported the duplication rate to be 39% and 66% for my two libraries. In my case, I think the proper way of de-duplication is to set a cutoff value, say 5, to tolerate some PCR duplication (and possibly amplification from distinct DNA fragments with identical ends). How to do this in a customized way? The reads are paired-end. It would be better to start from an alignment file like BAM/SAM.

sequencing alignment • 1.5k views
ADD COMMENTlink modified 4.0 years ago by Biostar ♦♦ 20 • written 5.5 years ago by chxu0210
gravatar for Devon Ryan
5.5 years ago by
Devon Ryan96k
Freiburg, Germany
Devon Ryan96k wrote:

There's no generally applicable way to deal with deduplicating targeted sequencing data (this is also true for things like RRBS). You can set a threshold if you want, in which case you'll have to tailor things for each experiment and write a program to do this. Traditionally, one simply doesn't deduplicate the dataset since there will be many false positives.

ADD COMMENTlink written 5.5 years ago by Devon Ryan96k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1394 users visited in the last hour