I recently have been working on a reduced representation bisulfite sequencing project (RRBS) where I was required to do 15-18 cycles of PCR to get libraries to a sufficient concentration for sequencing. Because of this, I ended up with quite a bit of duplicated sequences: 75-90% as determined by FastQC. While 75-90% duplication is obviously an issue, I am having a hard time finding an "expected" range for the percent of duplicated sequences for RRBS. Given that RRBS data is lowly diverse by nature and FastQC is working under the assumption libraries should be very diverse, I am curious to know at what point other people start considering deduplication steps. All I could find regarding this range was the FastQC example for RRBS (has duplication levels at ~25%) (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/RRBS_fastqc.html) and this post on Biostars which reports 40% of reads being duplicated in an RRBS experiment (PCR duplicates in RRBS data)
Does anyone have a reference for a specific range or have any insight in what could be considered "normal" levels of duplication for RRBS?
Thank you in advance!