Picard MarkDuplicates help – how to find the number of duplicates removed
1
0
Entering edit mode
2.4 years ago
urvashi_s • 0

Hello,

It is my first time using Picard to remove duplicates, and here are some of the duplication metrics:

ESTIMATED_LIBRARY_SIZE: 24195090

PERCENT_DUPLICATION: 0.707214 (so ~70%)

READ_PAIR_DUPLICATES: 55689797 (~55.7M)

Histogram BIN1: 8692285 (so this is essentially the fragments present a single time)

In the metrics or the output file, how can I find the number of reads/fragments that have been removed?

Any help would be appreciated.

mark rna-seq picard duplicates • 972 views
ADD COMMENT
0
Entering edit mode
2.4 years ago
uli • 0

From the metrics, the number of duplicates which were detected is the 55M number (READ_PAIR_DUPLICATES). You can find more information on the picard outputs here: https://broadinstitute.github.io/picard/picard-metric-definitions.html#DuplicationMetrics. If run in default mode the program won't automatically remove duplicates, so you would need to add REMOVE_DUPLICATES=true.

Particularly for RNA-seq, though, whether to remove PCR duplicates at all is still debated and can sometimes do more harm than good: https://www.nature.com/articles/srep25533.

ADD COMMENT

Login before adding your answer.

Traffic: 2450 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6