error in MarkDuplicates Picard
0
0
Entering edit mode
3.7 years ago
evelyn ▴ 230

Hello,

I have aligned multiple fastq files using bwa-mem and got .sorted.bam files:

INPUT_DIR=/path/trimmed
OUTPUT_DIR=/path/result
INPUT_FILE_ONE=$(ls -1 $INPUT_DIR/*_R1_paired.fastq.gz | sed -n ${RUN}p)
SAMPLE=$(basename "$INPUT_FILE_ONE" _R1_paired.fastq.gz)

bwa mem genome.fasta ${INPUT_DIR}/${SAMPLE}_R1_paired.fastq.gz ${INPUT_DIR}/${SAMPLE}_R2_paired.fastq.gz > ${OUTPUT_DIR}/${SAMPLE}_paired_bwa.sam
samtools view -S -b ${OUTPUT_DIR}/${SAMPLE}_paired_bwa.sam > ${OUTPUT_DIR}/${SAMPLE}_paired_bwa.bam
samtools sort ${INPUT_DIR}/${SAMPLE}_paired_bwa.bam -o ${OUTPUT_DIR}/${SAMPLE}_paired_bwa.sorted.bam
samtools index ${OUTPUT_DIR}/${SAMPLE}_paired_bwa.sorted.bam

Then I tried using Picard to remove duplicates from sorted.bam files using:

INPUT_DIR=/path/result
OUTPUT_DIR=/path/Duplicate_marking_picard
INPUT_FILE_ONE=$(ls -1 $INPUT_DIR/*_paired_bwa.sorted.bam | sed -n ${RUN}p)
SAMPLE=$(basename "$INPUT_FILE_ONE" _paired_bwa.sorted.bam)
echo "RUN #${RUN} with sample ${SAMPLE}"

java -Xms1g -Xmx3g -jar picard.jar MarkDuplicates \
I=${INPUT_DIR}/${SAMPLE}_paired_bwa.sorted.bam \
O=${OUTPUT_DIR}/${SAMPLE}_picard.sorted.bam \
M=${OUTPUT_DIR}/${SAMPLE}_metrics.txt \
TMP_DIR=`pwd`/tmp

However, after resulting in .picard.sorted.bam and metrics.txt files for input files, it started resulting in only .picard.sorted.bam and no metrics.txt file. When I checked the log files for such cases, it gives a long message including this error:

Exception in thread "main" htsjdk.samtools.util.RuntimeIOException: Write error; BinaryCodec in writemode; streamed file (filename not available)
Caused by: java.io.IOException: Disk quota exceeded

I tried again but I still got the same error after getting results for some files. Thank you for the help!

sequencing • 1.7k views
ADD COMMENT
0
Entering edit mode

Disk quota exceeded

This has nothing to do with Picard, you ran out of space assigned to you by the system administrators. Either you have to ask for more space, or delete unneeded files.

You could save a lot space by streaming the bwa mapping directly into samtools:

bwa mem genome.fasta ${INPUT_DIR}/${SAMPLE}_R1_paired.fastq.gz ${INPUT_DIR}/${SAMPLE}_R2_paired.fastq.gz \
  | samtools sort -o ${OUTPUT_DIR}/${SAMPLE}_paired_bwa.sorted.bam -
ADD REPLY
0
Entering edit mode

Thank you! I have checked our group's space and we still have enough space left. We did not get any space related notification which we usually get once we use 90% of our assigned space. I needed .sam files that's why I did not choose to skip that.

I am wondering if it has to do with the same TMP_DIR in picard command. I just checked and the TMP_DIR is empty and I am not sure if I have used it correctly in picard command line.

ADD REPLY
0
Entering edit mode

TMP_DIR may indeed be empty when there is no job running. You can re-start the job and watch that directory. You may have a separate quota on the directory where you have TMP_DIR.

ADD REPLY
0
Entering edit mode

Thank you! It was empty while the job was running. That's why I am wondering if my code is correct to use TMP_DIR. How can I assign a separate quota on this directory.

ADD REPLY
0
Entering edit mode

If that did not work then you could try -Djava.io.tmpdir=/directory_path.

BTW did you make a directory called tmp in your working directory when using pwd/tmp.

ADD REPLY
0
Entering edit mode

Thank you! Yes, I made a directory called tmp in my working directory. I will try your suggestion.

ADD REPLY
0
Entering edit mode

This command did not work. I get the same error after some samples.

ADD REPLY
0
Entering edit mode

can you post quota -s command result?

ADD REPLY
0
Entering edit mode

Our group quota is 11118938420 kbytes used out of 13958643712 kbytes available. I submitted the job by increasing the memory. I am not sure if it will work.

ADD REPLY

Login before adding your answer.

Traffic: 2909 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6