Question: samtools rmdup error
0
gravatar for ttsutsui1028
9 months ago by
ttsutsui102810
ttsutsui102810 wrote:

I am trying to remove PCR duplicate from my sam file. When I use samtools rmdup, it will abort in the middle.

  samtools rmdup my_data.sam my_data_rmdup.sam
[bam_rmdup_core] processing reference chr6...
....
samtools(5713,0x7fffa2def340) malloc: *** error for object 0x7f999f600508: incorrect checksum for freed object - object was probably modified after being freed. *** set a breakpoint in malloc_error_break to debug Abort trap: 6

I can't solve this issue. My condition is following ProductName: Mac OS X ProductVersion: 10.13.2 32G memory samtools Version: 1.6 (using htslib 1.6)

I would be happy if someone gave me an advice. Thank you.

wgbs samtools samtool • 699 views
ADD COMMENTlink modified 9 months ago by h.mon19k • written 9 months ago by ttsutsui102810
2

try again after converting the file into bam.

ADD REPLYlink written 9 months ago by mbk0asis380
0
gravatar for h.mon
9 months ago by
h.mon19k
Brazil
h.mon19k wrote:

samtools rmdup should not be used:

samtools rmdup [-sS] <input.srt.bam> <out.bam>

This command is obsolete. Use markdup instead.

In addition, when using samtools markdup, you have to sort the .bam (not .sam) by coordinate and run samtools fixmate.

samtools markdup [-l length] [-r] [-s] in.algsort.bam out.bam

Mark duplicate alignments from a coordinate sorted file that has been run through fixmate with the -m option. This program relies on the MC and ms tags that fixmate provides.

ADD COMMENTlink written 9 months ago by h.mon19k

It is perfectly fine to use rmdup, especially on larger datasets where time and IO are a limiting factor. Rmdup is not perfect, but the ultimate differences, at least in WGS, to MarkDuplikates from Picard are small, source here.

ADD REPLYlink written 9 months ago by ATpoint7.4k

Once we are at it... Recently I've been using bamsort in biobambam to sort and mark or remove duplicates. The nice thing is that you can stream alignment from bwa to bamsort and get sorted, marked, and indexed bam file almost for free given that it all works in a stream. bamsort also gives duplicate metrics similar to picard/MarkDuplicates. E.g.:

bwa mem ref.fa R1.fq.gz R2.fq.gz \
| bamsort inputformat=sam markduplicates=1 rmdup=0 fixmates=1 inputthreads=8 outputthreads=8 \
M=aln.dupmetrics.txt O=aln.bam index=1 indexfilename=aln.bam.bai

(Of course, the aligner doesn't have to be bwa as long the input to bamsort is sam or bam collated by read name.)

ADD REPLYlink modified 9 months ago • written 9 months ago by dariober9.4k

Cool, thanks for mentioning that tool. Will check it out. I always found it a pain that Picard MarkDuplicates and SamBamba markdup are not pipe-able, especially for cohorts of WGS.

ADD REPLYlink written 9 months ago by ATpoint7.4k

Nonetheless, rmdup works on a sorted bam, not on a sam file.

ADD REPLYlink written 9 months ago by h.mon19k

Agreed. Except, samtools fixmate needs name-sorted file. Coordinate-sorted file not accepted.

ADD REPLYlink written 6 months ago by Satyajeet Khare1.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 848 users visited in the last hour