Question: Picard tools duplicate removal
0
gravatar for blur
8 months ago by
blur90
European Union
blur90 wrote:

Hi, I want to use PICARD tools markduplicates option, but after reading the manual I am still not sure I understand the method used. http://broadinstitute.github.io/picard/command-line-overview.html#MarkDuplicates It reads: "The MarkDuplicates tool works by comparing sequences in the 5 prime positions of both reads and read-pairs in a SAM/BAM file"

Does this mean duplicates are marked based on their chr+start position and the 5'-sequence? or does the tool take the full sequence into account by using the CIGAR data?

Thanks in advance.

rna-seq picard-tools • 1.7k views
ADD COMMENTlink written 8 months ago by blur90

Will the answer to this question influence your decision to use it or not in any way?

ADD REPLYlink written 8 months ago by YaGalbi1.3k

Yes. Duplicate removal had influenced my results dramatically in the past.

ADD REPLYlink written 8 months ago by blur90
2

Hope you do not want to remove duplicates from RNA-seq data, as the tags of your post suggest?

ADD REPLYlink written 8 months ago by ATpoint4.3k
1

That is exactly why this operation is so dangerous. You better be sure that the removed duplicates are all artificial and not a natural effect of the high coverage.

There is a common myth floating around that "duplicates" are a synonym of "error". That is a remnant of the past when coverages were typically low.

ADD REPLYlink modified 8 months ago • written 8 months ago by Istvan Albert ♦♦ 77k

Keeping in mind @ATPoint's note, if you do want to remove PCR/optical duplicates for other reasons then use Clumpify (A: Introducing Clumpify: Create 30% Smaller, Faster Gzipped Fastq Files ) It does not need the data to be aligned and works from sequences.

ADD REPLYlink modified 8 months ago • written 8 months ago by genomax49k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 634 users visited in the last hour