In our lab, we are using ddRAD sequencing with DBR to deal with PCR duplicates. We basically add a DBR, in the Read2 paired end. Here is an example, (the DBR is square brackets):
The sequence here shows "...N...N" the genomic DNA, some bases at the enzyme cut site, the DBR located in R2 [in brackets] and the rest of the read2. We have an individual barcode in R1 (not shown), and an index in R2 which is recorded in a FASTQ header.
My questions are:
- How can we deal with DBR bioinformatically? Is there a software that can deal with the DBRs?
- It seems that Stacks has different options to do this. If you go on their manual it seems that their Oligo sequence options for the
clone_filtercould do the trick. But it's unclear on how to apply it or if it's designed to deal with DBRs. Did someone used it before? How are these options dealing with duplicates?
Here are some references to this:
Schweyen, H., Rozenberg, A., & Leese, F. (2014). Detection and removal of PCR duplicates in population genomic ddRAD studies by addition of a degenerate base region (DBR) in sequencing adapters. The Biological Bulletin, 227(2), 146-160. http://doi.org/10.1086/BBLv227n2p146
Tin, M. M. Y., Rheindt, F. E., Cros, E., & Mikheyev, A. S. (2014). Degenerate adaptor sequences for detecting PCR duplicates in reduced representation sequencing data improve genotype calling accuracy. Molecular Ecology Resources, 15(2), 329-336. http://doi.org/10.1111/1755-0998.12314
They provide some scripts to deal with DBRs, but we are unsure on how the algorithms work, especially when they are "chopping" the sequences in chunks and analyzing the chunks individually.
I've looked at other posts on PCR duplicates and it does seems that there is a way to deal with PCR duplicates with DBRs.