How to set a cutoff value when de-duplicating
1
1
Entering edit mode
9.2 years ago
chxu02 ▴ 10

I'm doing BS-seq with some ChIP DNA. To get 500M reads from <1ng ChIP DNA, you can imagine the duplication level is HUGE. FastQC reported the duplication rate to be 39% and 66% for my two libraries. In my case, I think the proper way of de-duplication is to set a cutoff value, say 5, to tolerate some PCR duplication (and possibly amplification from distinct DNA fragments with identical ends). How to do this in a customized way? The reads are paired-end. It would be better to start from an alignment file like BAM/SAM.

sequencing alignment • 2.0k views
ADD COMMENT
1
Entering edit mode
9.2 years ago

There's no generally applicable way to deal with deduplicating targeted sequencing data (this is also true for things like RRBS). You can set a threshold if you want, in which case you'll have to tailor things for each experiment and write a program to do this. Traditionally, one simply doesn't deduplicate the dataset since there will be many false positives.

ADD COMMENT

Login before adding your answer.

Traffic: 3225 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6