Picard MarkDuplicates and SamTools rmdup algorithm documentation
1
4
Entering edit mode
7.9 years ago
Mark Ebbert ▴ 90

Hi,

I'm looking for official algorithm documentation on Picard MarkDuplicates and SamTools rmdup, but I can't find it. I have found numerous posts in "google land" where people state why one is better, but I want to know exactly how they both work (preferably without going through the code). For example, I have "heard" that MarkDuplicates is more "intelligent" because it allegedly considers variants within a read rather than just looking at where reads begin.

Can anyone point me towards a paper or documentation that discusses the true differences between the algorithms?

Thanks for your help!

Mark

markduplicates samtools picard rmdup • 8.9k views
ADD COMMENT
7
Entering edit mode
7.9 years ago

warning this answer is old. check the new version of samtools.

SamTools rmdup 'only' compares two reads on chrom and pos (which could be wrong if two reads come from two different libraries) and removes reads from the BAM: information is lost.

picard sets the sam flag 1024 but does not delete the reads. two pairs of reads are compared , as far as I know, using the chrom, the pos, the group-id (sample...) + (flowcell , lane, X,Y for optical dups) (,and the cigar string ?).

ADD COMMENT
0
Entering edit mode

Thank you Pierre, that's similar to what I've been hearing, but do you know where this is documented? I'm curious where you learned it. Thanks!

ADD REPLY
4
Entering edit mode

I looked at the sources.

ADD REPLY
0
Entering edit mode

Great, thanks Pierre!

ADD REPLY

Login before adding your answer.

Traffic: 1275 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6