Question: picard markduplicate output smaller file
gravatar for Peter Chung
10 months ago by
Peter Chung80
Hong Kong
Peter Chung80 wrote:

I am new in WGS analysis. First, I combine all my bam files into one and it's 157GB and then add read group on it to 159GB. Then I do the picard markduplicate step by using the following code:

java -Xmx8g${TMPFILE} -jar $PICARD MarkDuplicates \
INPUT=${FILE}.addRG.bam \
OUTPUT=${FILE}.addRG.mkdup.bam \

It returns no error but the output file is 18Gb and there is not metrics file generated. I don't know what happened, any advice? Thanks.

The result otuputs from picard markduplicate, but there is no error inside.

[Fri Jan 18 08:53:43 UTC 2019] picard.sam.markduplicates.MarkDuplicates done. El                         apsed time: 111.00 minutes.
To get help, see
Exception in thread "main" java.lang.IllegalArgumentException: Alignments added                          out of order in 
SAMFileWriterImpl.addAlignment for file:///data/data/Samples/CHS                         
/SRS006915/SRS006915.addRG.mkdup.bam. Sort order is coordinate. Offending record                         s are at [*:0] 
and [chrM:1]
    at htsjdk.samtools.SAMFileWriterImpl.assertPresorted(SAMFileWriterImpl.j                         ava:213)
    at htsjdk.samtools.SAMFileWriterImpl.addAlignment(                         :200)
    at picard.sam.markduplicates.MarkDuplicates.doWork(                         06)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.jav                         a:282)
    at picard.cmdline.PicardCommandLine.instanceMain(                         98)
    at picard.cmdline.PicardCommandLine.main(
ADD COMMENTlink modified 10 months ago • written 10 months ago by Peter Chung80

Hello Peter Chung ,

the message you are showing is an error. You can see it by the word Exception. A quick web search suggest, that the sorting order given in the header is different to the alignment. People who uses Picard's ReorderSam seem to have this problem.

So the questions are:

  • How did you combine your bam files?
  • How did you sort them?

fin swimmer

ADD REPLYlink written 10 months ago by finswimmer13k

oh thanks. First I used bwa to align them and then use samtools sort to sort each bam files. Afterwards, I combined all the bam files into one bam file by samtools merge. After that, I used samtools addreplacerg to add readgroup.

bwa and samtools sort

for f in $(ls -l *.bam | awk '$5 < 90000000000 {print $9}' | awk -F"_" '{print $1}'); do
    bwa mem -M -t 8 $REF ${f}_1.fastq.gz ${f}_2.fastq.gz | samtools sort > ${f}_sorted.bam;

samtools merge

FNAME=(`pwd | awk -F"/" '{print $6}'`)
LIST=$(for file in *.bam; do echo -n "$file "; done)
samtools merge -nthreads=8 ${FNAME}.bam $LIST

samtools addreplacerg

samtools addreplacerg -r 'ID:${name}' \
-r 'LB:lib1' \
-r 'PL:illumina' \
-r 'PU:unit1' \
-r 'SM:${GP}.${name}' \
-o ${name}.addRG.bam ${name}.bam

any advice? thanks.

ADD REPLYlink modified 10 months ago by finswimmer13k • written 10 months ago by Peter Chung80

Hmm, I cannot see any crucial thing. What version of samtools and picard are you using? Also maybe we can see something in the header of the input file for MarkDuplicate (samtools view -H input.bam).

BTW: You can define the ReadGroup already with bwa. Then no extra step with samtools addreplacerg is neccessary.

ADD REPLYlink written 10 months ago by finswimmer13k

Hi, have you tried increasing the heap size? also, check the TMP_DIR location has more than 159 GB free space.

ADD REPLYlink written 10 months ago by arup1.9k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1503 users visited in the last hour