Question: samtools rmdup error
gravatar for ttsutsui1028
3 months ago by
ttsutsui102810 wrote:

I am trying to remove PCR duplicate from my sam file. When I use samtools rmdup, it will abort in the middle.

  samtools rmdup my_data.sam my_data_rmdup.sam
[bam_rmdup_core] processing reference chr6...
samtools(5713,0x7fffa2def340) malloc: *** error for object 0x7f999f600508: incorrect checksum for freed object - object was probably modified after being freed. *** set a breakpoint in malloc_error_break to debug Abort trap: 6

I can't solve this issue. My condition is following ProductName: Mac OS X ProductVersion: 10.13.2 32G memory samtools Version: 1.6 (using htslib 1.6)

I would be happy if someone gave me an advice. Thank you.

wgbs samtools samtool • 325 views
ADD COMMENTlink modified 3 months ago by h.mon12k • written 3 months ago by ttsutsui102810

try again after converting the file into bam.

ADD REPLYlink written 3 months ago by mbk0asis290
gravatar for h.mon
3 months ago by
h.mon12k wrote:

samtools rmdup should not be used:

samtools rmdup [-sS] <> <out.bam>

This command is obsolete. Use markdup instead.

In addition, when using samtools markdup, you have to sort the .bam (not .sam) by coordinate and run samtools fixmate.

samtools markdup [-l length] [-r] [-s] in.algsort.bam out.bam

Mark duplicate alignments from a coordinate sorted file that has been run through fixmate with the -m option. This program relies on the MC and ms tags that fixmate provides.

ADD COMMENTlink written 3 months ago by h.mon12k

It is perfectly fine to use rmdup, especially on larger datasets where time and IO are a limiting factor. Rmdup is not perfect, but the ultimate differences, at least in WGS, to MarkDuplikates from Picard are small, source here.

ADD REPLYlink written 3 months ago by ATpoint3.2k

Once we are at it... Recently I've been using bamsort in biobambam to sort and mark or remove duplicates. The nice thing is that you can stream alignment from bwa to bamsort and get sorted, marked, and indexed bam file almost for free given that it all works in a stream. bamsort also gives duplicate metrics similar to picard/MarkDuplicates. E.g.:

bwa mem ref.fa R1.fq.gz R2.fq.gz \
| bamsort inputformat=sam markduplicates=1 rmdup=0 fixmates=1 inputthreads=8 outputthreads=8 \
M=aln.dupmetrics.txt O=aln.bam index=1 indexfilename=aln.bam.bai

(Of course, the aligner doesn't have to be bwa as long the input to bamsort is sam or bam collated by read name.)

ADD REPLYlink modified 3 months ago • written 3 months ago by dariober8.8k

Cool, thanks for mentioning that tool. Will check it out. I always found it a pain that Picard MarkDuplicates and SamBamba markdup are not pipe-able, especially for cohorts of WGS.

ADD REPLYlink written 3 months ago by ATpoint3.2k

Nonetheless, rmdup works on a sorted bam, not on a sam file.

ADD REPLYlink written 3 months ago by h.mon12k

Agreed. Except, samtools fixmate needs name-sorted file. Coordinate-sorted file not accepted.

ADD REPLYlink written 23 days ago by Satyajeet Khare1.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 986 users visited in the last hour