Nextflow Gatk DepthOfCoverage: Failure working with the tmp directory
2
0
Entering edit mode
14 months ago
Eliveri ▴ 350

I have nextflow workflow for which the process DepthOfCoverage failed to work with the defined tmp directory --tmp-dir tmp

process pf_read_depth {

tag "tag"
scratch true

publishDir ...

input:
tuple val(pair_id), path(pf_bam)
path refdir

output:
file("final_${pair_id}.tsv")

script:
"""
samtools index -bc ${pf_bam}

for i in 01 02 03 04 05 06 07 08 09 10 11 12 13 14
    do
       gatk --java-options "-Xmx${params.gatk_memory}g -Xms${params.gatk_memory}g" DepthOfCoverage \
       -R "$refdir/genome.fasta" \
       -O chr"\$i" \
       -L Pf3D7_"\$i"_v3 \
       --omit-locus-table true \
       -I ${pf_bam} --tmp-dir tmp
       awk -F"," -v OFS="\t" '{ print \$0, \$(NF+1) = '"chr\$i"' }' chr"\$i".sample_summary > chr"\$i".sample2_summary
    done

cat *.sample2_summary | awk '!/sample_id/ {print \$0}' | sed '1isample_id, total, mean, third_quartile, median, first_quartile, bases_perc_above_15, chromosome' > ReadCoverage_final_${pair_id}.tsv
"""

}

The error message:

Command error:
  ***********************************************************************

  A USER ERROR has occurred: Failure working with the tmp directory /scratch/776952.1.long.q/nxf.JwEioHGECG/tmp. Try changing the tmp dir with with --tmp-dir on the command line.  Exact error was should exist and have read/write access

  ***********************************************************************
  Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
  Using GATK jar /opt/miniconda/share/gatk4-4.3.0.0-0/gatk-package-4.3.0.0-local.jar
  Running:
      java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx10g -Xms10g -jar /opt/miniconda/share/gatk4-4.3.0.0-0/gatk-package-4.3.0.0-local.jar DepthOfCoverage -R genomes/Pf3D7.fasta -O chr01 -L Pf3D7_01_v3 --omit-locus-table true -I 8034209834_S166_L002.sorted.dup.pf.bam --tmp-dir tmp

Work dir:
  /scratch/folder/new_workflow/work/39/c5f8963f4430d08005f6cd39a51919
tmp gatk nextflow • 1.4k views
ADD COMMENT
1
Entering edit mode
for i in 01 02 03 04 05 06 07 08 09 10 11 12 13 14

you're using nextflow, so this could/should be easily parallelized....

ADD REPLY
3
Entering edit mode
14 months ago

instead of filling /tmp, use a local TMP dir

process pf_read_depth {

afterScript "rm -rf TMP"
(...)
script:
"""
mkdir -p TMP


gatk --tmp-dir TMP (....)

"""
}
ADD COMMENT
0
Entering edit mode

Thank you! Using afterScript improved memory usage

ADD REPLY
1
Entering edit mode
14 months ago
Eliveri ▴ 350

I found that the small detail of changing --tmp-dir tmp to --tmp-dir /tmp seems to have fixed the issue.

ADD COMMENT
1
Entering edit mode

We ended up using --tmp-dir . in nf-core for all gatk processses (cf modules/nf-core/gatk4/applybqsr/)

ADD REPLY
0
Entering edit mode

Thank you for pointing me to a great repository of modules!

ADD REPLY
1
Entering edit mode

With pleasure, you can use our python tools (https://nf-co.re/tools/#install-modules-in-a-pipeline) to install them easily if you want. No need to have a nf-core pipeline for that

ADD REPLY

Login before adding your answer.

Traffic: 1338 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6