Samtools rmdup and Piccard Markduplicates
1
1
Entering edit mode
6.7 years ago
Prakash ★ 2.2k

Hello Bio Stars,

I have doubt regarding duplicate removal from BAM file. I used two tools "samtools rmdup" and Piccard MarkDuplicates.

I would like to understand why both of the tools remove different amount of duplicate reads.

The document says:

The MarkDuplicates tool works by comparing sequences in the 5 prime positions of both reads and read-pairs in a SAM/BAM file

Samtools rmdup: if multiple read pairs have identical external coordinates, only retain the pair with highest mapping quality. In the paired-end mode, this command ONLY works with FR orientation and requires ISIZE is correctly set

Before removing duplicates

57757837 + 0 in total (QC-passed reads + QC-failed reads)
3902505 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
57757837 + 0 mapped (100.00% : N/A)
53855332 + 0 paired in sequencing
27555734 + 0 read1
26299598 + 0 read2
44132336 + 0 properly paired (81.95% : N/A)
45979358 + 0 with itself and mate mapped
7875974 + 0 singletons (14.62% : N/A)
278406 + 0 with mate mapped to a different chr
123930 + 0 with mate mapped to a different chr (mapQ>=5)

Samtools rmdup result

command: Samtools -S input.bam output.bam

17595767 + 0 in total (QC-passed reads + QC-failed reads)
1161712 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
17595767 + 0 mapped (100.00% : N/A)
16434055 + 0 paired in sequencing
8398895 + 0 read1
8035160 + 0 read2
14057950 + 0 properly paired (85.54% : N/A)
14586355 + 0 with itself and mate mapped
1847700 + 0 singletons (11.24% : N/A)
75278 + 0 with mate mapped to a different chr
38992 + 0 with mate mapped to a different chr (mapQ>=5)

Piccard MarkDuplicates result

command: java -jar /apps/picard.jar MarkDuplicates I=input.bam O=outpu.bam M=marked_dup_metrics.txt REMOVE_DUPLICATES=true

41909982 + 0 in total (QC-passed reads + QC-failed reads)
3902505 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
41909982 + 0 mapped (100.00% : N/A)
38007477 + 0 paired in sequencing
19124624 + 0 read1
18882853 + 0 read2
34855826 + 0 properly paired (91.71% : N/A)
36456720 + 0 with itself and mate mapped
1550757 + 0 singletons (4.08% : N/A)
244146 + 0 with mate mapped to a different chr
107980 + 0 with mate mapped to a different chr (mapQ>=5)
alignment next-gen • 5.5k views
ADD COMMENT
0
Entering edit mode

Because they use different algorithum. Piccard MarkDuplicates seems better.

ADD REPLY
0
Entering edit mode

If you ask these kind of questions, provide the command lines you used so that people can kind of reproduce what you did.

ADD REPLY
0
Entering edit mode

I have edited my query.

ADD REPLY
2
Entering edit mode
6.7 years ago

A similar discussion could be found here

According to Picard FAQs, samtools rmdup do not remove interchromosomal duplicates while picard MarkDuplicates does!

ADD COMMENT

Login before adding your answer.

Traffic: 1716 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6