Hello, I am using UMI-tools for deduplicating some RNA-seq samples. I have high duplication levels due to the protocol used for library generation. However, after using UMI-tools the duplication levels are still higher than 20% in all the samples. Is this normal or I should change some UMI-tools parametres?
Many thanks, Goren
P.D: In this image you can see that, for example, sample "CCI-121" has 96% duplicates before using UMI-tools and 76% after deduplication. I measure duplication levels using Picard.
Script for UMI-tools:
umi_tools dedup -I /input.bam --output-stats=/output/directory/stats -S /output.bam --buffer-whole-contig
Script for Picard:
picard MarkDuplicates REFERENCE_SEQUENCE=/reference/genome INPUT=/input.bam OUTPUT=output.txt REMOVE_SEQUENCING_DUPLICATES=FALSE ASSUME_SORTED=TRUE METRICS_FILE=output.txt CREATE_INDEX=TRUE