Question: Picard vs samtools rmdup
0
gravatar for xd_d
2.5 years ago by
xd_d90
xd_d90 wrote:

Hey all,

I want to remove duplicates from my bam file.

I use picard MarkDuplicates to remove the duplicates. (REMOVE_DUPLICATES=true)

After I run picard to "remove all duplicates" ,I found in the bam file reads that still flag MarkDuplicates and I found duplicate clusters that are not removed. I thought Picard remove all reads that are flag as Duplicates?

That's why I use samtools rmdup for paired end mode. It remove more reads than picard. But why ?

I thought when I use picard I remove all duplicates (optical and pcr)

I'm confused

rna-seq samtools picard • 5.7k views
ADD COMMENTlink modified 2.5 years ago by igor8.6k • written 2.5 years ago by xd_d90
2

post exact commands and samtools flagstat output before and after removing duplicates

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by geek_y9.9k

I post my problem below :)

ADD REPLYlink written 2.5 years ago by xd_d90
1

Try clumpify.sh from BBMap suite instead (Introducing Clumpify: Create 30% Smaller, Faster Gzipped Fastq Files. And remove duplicates. ).

ADD REPLYlink written 2.5 years ago by genomax73k

I check this out. In the next time I post the picard problem that not really remove all duplicates.

ADD REPLYlink written 2.5 years ago by xd_d90

I post it in the next time.

Finally, I want unique reads with unique coordinates

ADD REPLYlink written 2.5 years ago by xd_d90

Also post examples of remnant duplicates.

ADD REPLYlink written 2.5 years ago by Devon Ryan92k

Define unique coordinates further. Only one read covering every base or a read mapped starting at each base position?

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by genomax73k

my unique coordinates: only the start position should be uniques. If there a program to get these reads for bam files ? I know I lost information about paired end reads but this is not important for me in the next step.

ADD REPLYlink written 2.5 years ago by xd_d90
1

I think a very similar question was recently asked here. Let me see if I can find that thread.

ADD REPLYlink written 2.5 years ago by genomax73k

tank you ! Later I post the picard results that don't remove duplicate reads

ADD REPLYlink written 2.5 years ago by xd_d90

thanks ! I used awk to get unique start positions : )

ADD REPLYlink written 2.5 years ago by xd_d90
2
gravatar for igor
2.5 years ago by
igor8.6k
United States
igor8.6k wrote:

This previous thread about the exact differences between Samtools and Picard duplicate removal might be helpful: Picard MarkDuplicates and SamTools rmdup algorithm documentation

Also, this really old thread: http://seqanswers.com/forums/showthread.php?t=5424

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by igor8.6k
0
gravatar for xd_d
2.5 years ago by
xd_d90
xd_d90 wrote:

i start a new thread , because picard is an another topic

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by xd_d90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 751 users visited in the last hour