Question: Deduplication using UMItools
0
gravatar for Ati
9 months ago by
Ati30
Ati30 wrote:

I have some RNAseq data with a high duplication rate but the reads have UMI (Unique Molecular Identifiers). The UMI length is 5 bp. I have used umitools dedup to remove duplications. When I checked the duplication with MarkDuplicates tools (Picard) still the duplication is a bit high for some samples.

I would expect to have a low or even zero % duplication rate after using UMItools. Is there any explanation?

Could the length of UMI be the reason?

Thank you in advance!

ADD COMMENTlink modified 9 months ago by Devon Ryan97k • written 9 months ago by Ati30
1
gravatar for Devon Ryan
9 months ago by
Devon Ryan97k
Freiburg, Germany
Devon Ryan97k wrote:

Picard should be completely ignored if you have UMIs, as it doesn't use UMIs and will therefore give inflated duplication rates (picard reports PCR duplicates determined using the position of read ends, whereas umitools uses that information in addition to UMI sequence). If you have used umitools dedup then the actual duplication rate is 0, regardless of what picard may report.

ADD COMMENTlink written 9 months ago by Devon Ryan97k

@Devon Ryan Thank you! Even if the UMI length is short (5bp)?

ADD REPLYlink modified 9 months ago • written 9 months ago by Ati30

yes

ADD REPLYlink written 9 months ago by Devon Ryan97k

Thank you for your help!

ADD REPLYlink written 9 months ago by Ati30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 911 users visited in the last hour