Question: samtools markdup vs samtools rmdup
gravatar for a.abnousi
3 months ago by
a.abnousi30 wrote:

I know samtools rmdup is obsolete and markdup should be used instead. My old pipeline used rmdup and now I'm trying to upgrade it to use markdup.

When comparing the results between these two, using default settings, rmdup removes more reads on my test dataset (188M vs 185M remaining). I'm checking the manual, it looks like markdup by default removes PCR duplicates and not optical duplicates, I think that's what rmdup does too. (rmdup does not have an option for dealing with optical reads).

Where does this difference come from? How can I reproduce results similar to samtools rmdup using samtools markdup.


samtools markdup

samtools rmdup

markdup rmdup samtools • 306 views
ADD COMMENTlink written 3 months ago by a.abnousi30

If it is not documented then it is unlikely that rmdup did that. Still, why bothering with something like this? I recommend just using markdup (since it is the currently recommended tool within samtools) and then proceed with the analysis. One can spend a lot of time on these lowlevel things but eventually there is no benefit in overthinking it.

ADD REPLYlink written 3 months ago by ATpoint34k

I was thinking the same recently... since I need to do variant calling, I was wondering whether we should remove duplicate reads or just mark them? and will it affect the variant calling? if I just markdup, will duplicated we ignored?

ADD REPLYlink written 3 months ago by User000370

Yes, a proper variant calling tool will ignore duplicates if these are marked as such.

ADD REPLYlink written 3 months ago by ATpoint34k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1987 users visited in the last hour