Question: samtools markdup vs PICARD's MarkDuplicates when removing duplicated reads
0
gravatar for nanoide
11 months ago by
nanoide50
nanoide50 wrote:

Hi there,

So I'm currently analyzing some ATAC-seq data. The duplicated reads were removed using first samtools fixmate -m and then samtools markdup -rs. I'm facing many discarded reads and I cannot repeat this step anymore, maybe in the future. I was wondering, are there any known differences between this methods and other such as PICARD's MarkDuplicates?

Would be worth trying other methods for removing duplicated reads, or the % should be the same?

Any advice would be appreciated.

Thanks

ADD COMMENTlink modified 11 months ago by predeus1.4k • written 11 months ago by nanoide50
2
gravatar for ATpoint
11 months ago by
ATpoint36k
Germany
ATpoint36k wrote:

As far as I know, the methods perform similar for most applications and differences mainly affect edge cases such as supplementary alignments. Another option would be samblaster which I use in my ATAC-seq pipeline. In the end it probably does not matter. The advantage of samblaster/samtools over Picard is that it uses way less memory and can be used in Unix pipes. In ATAC-seq it is not uncommon to have some duplication, probably due to mitochondrial contamination. I would not worry too much about that and rather see if the downstream analysis indicates good quality (number of callable peaks, Fraction of Reads Per peak, good signal-to-noise ration when inspecting reads in a genome browser = distinct peaks without much noise).

ADD COMMENTlink written 11 months ago by ATpoint36k

Thanks for the insights! Regards

ADD REPLYlink written 11 months ago by nanoide50
1
gravatar for predeus
11 months ago by
predeus1.4k
Russia
predeus1.4k wrote:

Unless something has changed dramatically, you should use Picard and not samtools to mark duplicates in an aligned file. Even Heng Li (the author of samtools) said that he does not recommend using samtools markdup. The topic was discussed quite a lot, for example, here: http://seqanswers.com/forums/showthread.php?t=6854

If you need to remove the duplicates, make sure you set the appropriate flag in Picard MarkDuplicates.

ADD COMMENTlink written 11 months ago by predeus1.4k
4

This is a discussion from 2010 about samtools rmdup not markdup. rmdup is now deprecated with markdup a being a recent replacement. By best knowledge (correct me if I am wrong) there is still a good benchmark missing for markdup vs picard, but as said above, I would be surprised if for a standard paired-end dataset it would made a notable difference.

ADD REPLYlink written 11 months ago by ATpoint36k
1

Good call - like I said, "unless something has changed"!

I agree, the results should be very comparable.

ADD REPLYlink written 11 months ago by predeus1.4k

Thank you both for the comments, regards

ADD REPLYlink written 11 months ago by nanoide50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1504 users visited in the last hour