Samtools Rmdup And Picard Mark Duplicates
1
2
Entering edit mode
11.3 years ago
Kssr ▴ 110

I ran FastQC and found 33 % duplication levels in my sample.It is single end data.The average coverage is 10x.So, I used samtools rmdup and picard mark duplicates and my duplication levels dropped to 1 %.I have few questions regarding removing duplicates:

1.Do both samtools and picard remove duplicates based on position alone?How is picard mark duplicates different from rmdup?(they give very similar results though).Just curious to know which one is better.

2.I am not sure if it advisable to remove duplicates from single end data and how do the above programs treat them.

3.When I run samtools rmdup it prints

[bam_rmdupse_core] 3566092 / 20492754 = 0.1740

My final dedup .bam has 20979669 reads.I don't get what value we are considering for denominator in the above case i.e.value 20492754.Any comments/suggestions appreciated.

samtools picard duplicates • 11k views
ADD COMMENT
1
Entering edit mode
ADD REPLY
2
Entering edit mode
11.3 years ago

Short answer: duplicate identification may use sequence identity or mapping locations. But for the latter the read needs to be mapped to a location and unmapped reads are not processed. This explains a what you see. The optimal solution depends on many factors - the consensus seems to be the the picard markduplicates could be the best current solution.

The appropriateness of duplicate removal depends on coverage - one would want to only remove artificial duplicates and keep the natural duplicates.

There are many similar questions on Biostar - try search above.

ADD COMMENT

Login before adding your answer.

Traffic: 2368 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6