Question: Picard tools duplicate removal
0
gravatar for blur
14 months ago by
blur90
European Union
blur90 wrote:

Hi, I want to use PICARD tools markduplicates option, but after reading the manual I am still not sure I understand the method used. http://broadinstitute.github.io/picard/command-line-overview.html#MarkDuplicates It reads: "The MarkDuplicates tool works by comparing sequences in the 5 prime positions of both reads and read-pairs in a SAM/BAM file"

Does this mean duplicates are marked based on their chr+start position and the 5'-sequence? or does the tool take the full sequence into account by using the CIGAR data?

Thanks in advance.

rna-seq picard-tools • 2.9k views
ADD COMMENTlink written 14 months ago by blur90

Will the answer to this question influence your decision to use it or not in any way?

ADD REPLYlink written 14 months ago by YaGalbi1.4k

Yes. Duplicate removal had influenced my results dramatically in the past.

ADD REPLYlink written 14 months ago by blur90
2

Hope you do not want to remove duplicates from RNA-seq data, as the tags of your post suggest?

ADD REPLYlink written 14 months ago by ATpoint11k
1

That is exactly why this operation is so dangerous. You better be sure that the removed duplicates are all artificial and not a natural effect of the high coverage.

There is a common myth floating around that "duplicates" are a synonym of "error". That is a remnant of the past when coverages were typically low.

ADD REPLYlink modified 14 months ago • written 14 months ago by Istvan Albert ♦♦ 78k

Keeping in mind @ATPoint's note, if you do want to remove PCR/optical duplicates for other reasons then use Clumpify (A: Introducing Clumpify: Create 30% Smaller, Faster Gzipped Fastq Files ) It does not need the data to be aligned and works from sequences.

ADD REPLYlink modified 14 months ago • written 14 months ago by genomax59k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 817 users visited in the last hour