I have some RNAseq data with a high duplication rate but the reads have UMI (Unique Molecular Identifiers).
The UMI length is 5 bp.
I have used
umitools dedup to remove duplications. When I checked the duplication with
MarkDuplicates tools (
Picard) still the duplication is a bit high for some samples.
I would expect to have a low or even zero % duplication rate after using UMItools. Is there any explanation?
Could the length of UMI be the reason?
Thank you in advance!