Question: Samtools rmdup and Piccard Markduplicates
0
gravatar for prakash
15 months ago by
prakash520
prakash520 wrote:

Hello Bio Stars,

I have doubt regarding duplicate removal from BAM file. I used two tools "samtools rmdup" and Piccard MarkDuplicates.

I would like to understand why both of the tools remove different amount of duplicate reads.

The document says:

The MarkDuplicates tool works by comparing sequences in the 5 prime positions of both reads and read-pairs in a SAM/BAM file

Samtools rmdup: if multiple read pairs have identical external coordinates, only retain the pair with highest mapping quality. In the paired-end mode, this command ONLY works with FR orientation and requires ISIZE is correctly set

Before removing duplicates

57757837 + 0 in total (QC-passed reads + QC-failed reads)
3902505 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
57757837 + 0 mapped (100.00% : N/A)
53855332 + 0 paired in sequencing
27555734 + 0 read1
26299598 + 0 read2
44132336 + 0 properly paired (81.95% : N/A)
45979358 + 0 with itself and mate mapped
7875974 + 0 singletons (14.62% : N/A)
278406 + 0 with mate mapped to a different chr
123930 + 0 with mate mapped to a different chr (mapQ>=5)

Samtools rmdup result

command: Samtools -S input.bam output.bam

17595767 + 0 in total (QC-passed reads + QC-failed reads)
1161712 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
17595767 + 0 mapped (100.00% : N/A)
16434055 + 0 paired in sequencing
8398895 + 0 read1
8035160 + 0 read2
14057950 + 0 properly paired (85.54% : N/A)
14586355 + 0 with itself and mate mapped
1847700 + 0 singletons (11.24% : N/A)
75278 + 0 with mate mapped to a different chr
38992 + 0 with mate mapped to a different chr (mapQ>=5)

Piccard MarkDuplicates result

command: java -jar /apps/picard.jar MarkDuplicates I=input.bam O=outpu.bam M=marked_dup_metrics.txt REMOVE_DUPLICATES=true

41909982 + 0 in total (QC-passed reads + QC-failed reads)
3902505 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
41909982 + 0 mapped (100.00% : N/A)
38007477 + 0 paired in sequencing
19124624 + 0 read1
18882853 + 0 read2
34855826 + 0 properly paired (91.71% : N/A)
36456720 + 0 with itself and mate mapped
1550757 + 0 singletons (4.08% : N/A)
244146 + 0 with mate mapped to a different chr
107980 + 0 with mate mapped to a different chr (mapQ>=5)
next-gen alignment • 1.0k views
ADD COMMENTlink modified 15 months ago by Vijay Lakhujani3.2k • written 15 months ago by prakash520

Because they use different algorithum. Piccard MarkDuplicates seems better.

ADD REPLYlink written 15 months ago by t2g4free0

If you ask these kind of questions, provide the command lines you used so that people can kind of reproduce what you did.

ADD REPLYlink written 15 months ago by ATpoint9.3k

I have edited my query.

ADD REPLYlink written 15 months ago by prakash520
0
gravatar for Vijay Lakhujani
15 months ago by
Vijay Lakhujani3.2k
India
Vijay Lakhujani3.2k wrote:

A similar discussion could be found here

According to Picard FAQs, samtools rmdup do not remove interchromosomal duplicates while picard MarkDuplicates does!

ADD COMMENTlink written 15 months ago by Vijay Lakhujani3.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 800 users visited in the last hour