Question

Deduplication using UMItools

0

Entering edit mode

4.3 years ago

Ati ▴ 50

I have some RNAseq data with a high duplication rate but the reads have UMI (Unique Molecular Identifiers). The UMI length is 5 bp. I have used umitools dedup to remove duplications. When I checked the duplication with MarkDuplicates tools (Picard) still the duplication is a bit high for some samples.

I would expect to have a low or even zero % duplication rate after using UMItools. Is there any explanation?

Could the length of UMI be the reason?

Thank you in advance!

RNA-Seq bam umitools duplication Picard • 2.3k views

ADD COMMENT • link updated 4.3 years ago by Devon Ryan 104k • written 4.3 years ago by Ati ▴ 50

score 1 · Answer 1 · 2020-01-06

1

Entering edit mode

4.3 years ago

Devon Ryan 104k

Picard should be completely ignored if you have UMIs, as it doesn't use UMIs and will therefore give inflated duplication rates (picard reports PCR duplicates determined using the position of read ends, whereas umitools uses that information in addition to UMI sequence). If you have used umitools dedup then the actual duplication rate is 0, regardless of what picard may report.

ADD COMMENT • link 4.3 years ago by Devon Ryan 104k

0

Entering edit mode

@Devon Ryan Thank you! Even if the UMI length is short (5bp)?

ADD REPLY • link 4.3 years ago by Ati ▴ 50

0

Entering edit mode

yes

ADD REPLY • link 4.3 years ago by Devon Ryan 104k

0

Entering edit mode

Thank you for your help!

ADD REPLY • link 4.3 years ago by Ati ▴ 50