Question: samtools markdup vs PICARD's MarkDuplicates when removing duplicated reads
gravatar for nanoide
5 weeks ago by
nanoide30 wrote:

Hi there,

So I'm currently analyzing some ATAC-seq data. The duplicated reads were removed using first samtools fixmate -m and then samtools markdup -rs. I'm facing many discarded reads and I cannot repeat this step anymore, maybe in the future. I was wondering, are there any known differences between this methods and other such as PICARD's MarkDuplicates?

Would be worth trying other methods for removing duplicated reads, or the % should be the same?

Any advice would be appreciated.


ADD COMMENTlink modified 5 weeks ago by predeus1.2k • written 5 weeks ago by nanoide30
gravatar for ATpoint
5 weeks ago by
ATpoint21k wrote:

As far as I know, the methods perform similar for most applications and differences mainly affect edge cases such as supplementary alignments. Another option would be samblaster which I use in my ATAC-seq pipeline. In the end it probably does not matter. The advantage of samblaster/samtools over Picard is that it uses way less memory and can be used in Unix pipes. In ATAC-seq it is not uncommon to have some duplication, probably due to mitochondrial contamination. I would not worry too much about that and rather see if the downstream analysis indicates good quality (number of callable peaks, Fraction of Reads Per peak, good signal-to-noise ration when inspecting reads in a genome browser = distinct peaks without much noise).

ADD COMMENTlink written 5 weeks ago by ATpoint21k

Thanks for the insights! Regards

ADD REPLYlink written 5 weeks ago by nanoide30
gravatar for predeus
5 weeks ago by
predeus1.2k wrote:

Unless something has changed dramatically, you should use Picard and not samtools to mark duplicates in an aligned file. Even Heng Li (the author of samtools) said that he does not recommend using samtools markdup. The topic was discussed quite a lot, for example, here:

If you need to remove the duplicates, make sure you set the appropriate flag in Picard MarkDuplicates.

ADD COMMENTlink written 5 weeks ago by predeus1.2k

This is a discussion from 2010 about samtools rmdup not markdup. rmdup is now deprecated with markdup a being a recent replacement. By best knowledge (correct me if I am wrong) there is still a good benchmark missing for markdup vs picard, but as said above, I would be surprised if for a standard paired-end dataset it would made a notable difference.

ADD REPLYlink written 5 weeks ago by ATpoint21k

Good call - like I said, "unless something has changed"!

I agree, the results should be very comparable.

ADD REPLYlink written 5 weeks ago by predeus1.2k

Thank you both for the comments, regards

ADD REPLYlink written 5 weeks ago by nanoide30
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 999 users visited in the last hour