Question: picard markduplicate output smaller file
gravatar for Peter Chung
22 months ago by
Peter Chung120
Hong Kong
Peter Chung120 wrote:

I am new in WGS analysis. First, I combine all my bam files into one and it's 157GB and then add read group on it to 159GB. Then I do the picard markduplicate step by using the following code:

java -Xmx8g${TMPFILE} -jar $PICARD MarkDuplicates \
INPUT=${FILE}.addRG.bam \
OUTPUT=${FILE}.addRG.mkdup.bam \

It returns no error but the output file is 18Gb and there is not metrics file generated. I don't know what happened, any advice? Thanks.

The result otuputs from picard markduplicate, but there is no error inside.

[Fri Jan 18 08:53:43 UTC 2019] picard.sam.markduplicates.MarkDuplicates done. El                         apsed time: 111.00 minutes.
To get help, see
Exception in thread "main" java.lang.IllegalArgumentException: Alignments added                          out of order in 
SAMFileWriterImpl.addAlignment for file:///data/data/Samples/CHS                         
/SRS006915/SRS006915.addRG.mkdup.bam. Sort order is coordinate. Offending record                         s are at [*:0] 
and [chrM:1]
    at htsjdk.samtools.SAMFileWriterImpl.assertPresorted(SAMFileWriterImpl.j                         ava:213)
    at htsjdk.samtools.SAMFileWriterImpl.addAlignment(                         :200)
    at picard.sam.markduplicates.MarkDuplicates.doWork(                         06)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.jav                         a:282)
    at picard.cmdline.PicardCommandLine.instanceMain(                         98)
    at picard.cmdline.PicardCommandLine.main(
ADD COMMENTlink modified 22 months ago • written 22 months ago by Peter Chung120

Hello Peter Chung ,

the message you are showing is an error. You can see it by the word Exception. A quick web search suggest, that the sorting order given in the header is different to the alignment. People who uses Picard's ReorderSam seem to have this problem.

So the questions are:

  • How did you combine your bam files?
  • How did you sort them?

fin swimmer

ADD REPLYlink written 22 months ago by finswimmer14k

oh thanks. First I used bwa to align them and then use samtools sort to sort each bam files. Afterwards, I combined all the bam files into one bam file by samtools merge. After that, I used samtools addreplacerg to add readgroup.

bwa and samtools sort

for f in $(ls -l *.bam | awk '$5 < 90000000000 {print $9}' | awk -F"_" '{print $1}'); do
    bwa mem -M -t 8 $REF ${f}_1.fastq.gz ${f}_2.fastq.gz | samtools sort > ${f}_sorted.bam;

samtools merge

FNAME=(`pwd | awk -F"/" '{print $6}'`)
LIST=$(for file in *.bam; do echo -n "$file "; done)
samtools merge -nthreads=8 ${FNAME}.bam $LIST

samtools addreplacerg

samtools addreplacerg -r 'ID:${name}' \
-r 'LB:lib1' \
-r 'PL:illumina' \
-r 'PU:unit1' \
-r 'SM:${GP}.${name}' \
-o ${name}.addRG.bam ${name}.bam

any advice? thanks.

ADD REPLYlink modified 22 months ago by finswimmer14k • written 22 months ago by Peter Chung120

Hmm, I cannot see any crucial thing. What version of samtools and picard are you using? Also maybe we can see something in the header of the input file for MarkDuplicate (samtools view -H input.bam).

BTW: You can define the ReadGroup already with bwa. Then no extra step with samtools addreplacerg is neccessary.

ADD REPLYlink written 22 months ago by finswimmer14k

Hi, have you tried increasing the heap size? also, check the TMP_DIR location has more than 159 GB free space.

ADD REPLYlink written 22 months ago by Arup Ghosh2.7k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1168 users visited in the last hour