Question: CWL : file could not be created with exception File not found
1
gravatar for skanwal
2.7 years ago by
skanwal20
skanwal20 wrote:

Hi,

We are trying to create GATK workflow using CWL-draft3. The tool "PrintReads" is throwing the following error:

##### ERROR A USER ERROR has occurred (version 2.8-1-g932cd3a): 
##### ERROR
##### ERROR This means that one or more arguments or inputs in your command are incorrect.
##### ERROR The error message below tells you what is the problem.
##### ERROR
##### ERROR If the problem is an invalid argument, please check the online documentation guide
##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
##### ERROR
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
##### ERROR
##### ERROR MESSAGE: **Couldn't write file /var/spool/cwl/printReads-2016-25-07.bam because file could not be created with exception File not found: printReads-2016-25-07.bam**
##### ERROR ------------------------------------------------------------------------------------------
Error collecting output for parameter 'output_printReads'
Traceback (most recent call last):
  File "build/bdist.macosx-10.11-intel/egg/cwltool/draft2tool.py", line 330, in collect_output_ports
    ret[fragment] = self.collect_output(port, builder, outdir)
  File "build/bdist.macosx-10.11-intel/egg/cwltool/draft2tool.py", line 405, in collect_output
    raise WorkflowException("Did not find output file with glob pattern: '{}'".format(globpatterns))
WorkflowException: Did not find output file with glob pattern: '['printReads-2016-25-07.bam']'

The command generated by the tool is:

 run \
    -i \
    '--volume=/Users/sehrish/Google Drive/Analysis using Galaxy, Cpipe and CWL/GATK-worflow/tools/outputFiles/hg19.fasta:/var/lib/cwl/stg301d393b/hg19.fasta:ro' \
    '--volume=/Users/sehrish/Google Drive/Analysis using Galaxy, Cpipe and CWL/GATK-worflow/tools/outputFiles/hg19.fasta.amb:/var/lib/cwl/stg301d393b/hg19.fasta.amb:ro' \
    '--volume=/Users/sehrish/Google Drive/Analysis using Galaxy, Cpipe and CWL/GATK-worflow/tools/outputFiles/hg19.fasta.ann:/var/lib/cwl/stg301d393b/hg19.fasta.ann:ro' \
    '--volume=/Users/sehrish/Google Drive/Analysis using Galaxy, Cpipe and CWL/GATK-worflow/tools/outputFiles/hg19.fasta.rbwt:/var/lib/cwl/stg301d393b/hg19.fasta.rbwt:ro' \
    '--volume=/Users/sehrish/Google Drive/Analysis using Galaxy, Cpipe and CWL/GATK-worflow/tools/outputFiles/hg19.fasta.pac:/var/lib/cwl/stg301d393b/hg19.fasta.pac:ro' \
    '--volume=/Users/sehrish/Google Drive/Analysis using Galaxy, Cpipe and CWL/GATK-worflow/tools/outputFiles/hg19.fasta.sa:/var/lib/cwl/stg301d393b/hg19.fasta.sa:ro' \
    '--volume=/Users/sehrish/Google Drive/Analysis using Galaxy, Cpipe and CWL/GATK-worflow/tools/indelRealigner-2016-03-07.bai:/var/lib/cwl/stg6208674/indelRealigner-2016-03-07.bai:ro' \
    '--volume=/Users/sehrish/Google Drive/Analysis using Galaxy, Cpipe and CWL/GATK-worflow/tools/outputFiles/hg19.fasta.fai:/var/lib/cwl/stg301d393b/hg19.fasta.fai:ro' \
    '--volume=/Users/sehrish/Google Drive/Analysis using Galaxy, Cpipe and CWL/GATK-worflow/tools/indelRealigner-2016-03-07.bam:/var/lib/cwl/stg6208674/indelRealigner-2016-03-07.bam:ro' \
    '--volume=/Users/sehrish/Google Drive/Analysis using Galaxy, Cpipe and CWL/GATK-worflow/tools/outputFiles/hg19.fasta.rsa:/var/lib/cwl/stg301d393b/hg19.fasta.rsa:ro' \
    '--volume=/Users/sehrish/Google Drive/Analysis using Galaxy, Cpipe and CWL/GATK-worflow/tools/outputFiles/hg19.fasta.bwt:/var/lib/cwl/stg301d393b/hg19.fasta.bwt:ro' \
    '--volume=/Users/sehrish/Google Drive/Analysis using Galaxy, Cpipe and CWL/GATK-worflow/tools/outputFiles/hg19.dict:/var/lib/cwl/stg301d393b/hg19.dict:ro' \
    '--volume=/Users/sehrish/Google Drive/Analysis using Galaxy, Cpipe and CWL/GATK-worflow/tools/outputFiles/hg19.fasta.rpac:/var/lib/cwl/stg301d393b/hg19.fasta.rpac:ro' \
    '--volume=/Users/sehrish/Google Drive/Analysis using Galaxy, Cpipe and CWL/GATK-worflow/tools/baseRecalibrator-2016-03-07.table:/var/lib/cwl/stg2d0f63bf/baseRecalibrator-2016-03-07.table:ro' \
    --volume=/var/folders/q6/2hgf2qtj1k5gw2vkbfxjn6br0000gp/T/tmpaTh8Zp:/var/spool/cwl:rw \
    --volume=/var/folders/q6/2hgf2qtj1k5gw2vkbfxjn6br0000gp/T/tmpbBNmiv:/tmp:rw \
    --workdir=/var/spool/cwl \
    --rm \
    --env=TMPDIR=/tmp \
    --env=HOME=/var/spool/cwl \
    --env=PATH=/usr/local/bin/:/usr/bin:/bin \
    fzkhan/picard-1.136-gatk-2.8 \
    java \
    -Xmx4g \
    -Djava.io.tmpdir=/tmp \
    -jar \
    /home/biodocker/bin/GenomeAnalysisTK-2.8-1-g932cd3a/GenomeAnalysisTK.jar \
    -T \
    PrintReads \
    -R \
    /var/lib/cwl/stg301d393b/hg19.fasta \
    -I \
    /var/lib/cwl/stg6208674/indelRealigner-2016-03-07.bam \
    -BQSR \
    /var/lib/cwl/stg2d0f63bf/baseRecalibrator-2016-03-07.table \
    -o \
    printReads-2016-25-07.bam

Can someone please suggest a possible reason for this? Not sure why is this tool not able to create the final output file. Thanks.

cwl gatk • 1.5k views
ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by skanwal20

Can you share your CWL descriptions & how you ran cwltool? Thanks!

ADD REPLYlink written 2.7 years ago by Michael R. Crusoe1.4k

The command we are running is: cwltool --debug --tmpdir-prefix=/Users/sehrish/Google\ Drive/Analysis\ using\ Galaxy\,\ Cpipe\ and\ CWL/GATK-worflow/draft3/ --tmp-outdir-prefix=/Users/sehrish/Google\ Drive/Analysis\ using\ Galaxy\,\ Cpipe\ and\ CWL/GATK-worflow/draft3/ ./GATK-PrintReads.cwl ./GATK-PrintReads.json

We have also tried using: cwltool --debug GATK-PrintReads.cwl GATK-PrintReads.json

Here is the link to the CWL descriptions: https://gist.github.com/skanwal/3cf091a377cc49597f3cc045be369122

ADD REPLYlink written 2.7 years ago by skanwal20

Does this work without Docker by running --no-container?

ADD REPLYlink written 2.7 years ago by Michael R. Crusoe1.4k

We have changed the script to use jar file but now it's not able to find the jar file and throws the following error:

[job GATK-PrintReads.cwl] /var/folders/q6/2hgf2qtj1k5gw2vkbfxjn6br0000gp/T/tmpbVvTvC$ java \
    -Xmx4g \
    -Djava.io.tmpdir=/tmp \
    -jar \
    '/Users/sehrish/Google Drive/Analysis using Galaxy, Taverna, cpipe and CWL/GATK-workflow/tools/GATK-2.8/GenomeAnalysisTK-2.8-1-g932cd3a/GenomeAnalysisTK.jar' \
    -T \
    PrintReads \
    -R \
    /var/folders/q6/2hgf2qtj1k5gw2vkbfxjn6br0000gp/T/tmpmmXRLF/stg21864f8f/hg19.fasta \
    -I \
    /var/folders/q6/2hgf2qtj1k5gw2vkbfxjn6br0000gp/T/tmpmmXRLF/stg10d0cee8/indelRealigner-2016-03-07.bam \
    -BQSR \
    /var/folders/q6/2hgf2qtj1k5gw2vkbfxjn6br0000gp/T/tmpmmXRLF/stg21f0b0ef/baseRecalibrator-2016-03-07.table \
    -o \
    printReads-2016-25-07.bam
Error: **Unable to access jarfile /Users/sehrish/Google Drive/Analysis using Galaxy, Taverna, cpipe and CWL/GATK-workflow/tools/GATK-2.8/GenomeAnalysisTK-2.8-1-g932cd3a/GenomeAnalysisTK.jar**
Error collecting output for parameter 'output_printReads'
ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by skanwal20

So the jar is now an input to the tool?

Can you update your gist or post a new one?

ADD REPLYlink written 2.7 years ago by Michael R. Crusoe1.4k

Yes. I have added an argument for jar file in the cwltool file I have updated the gist with the new file.

ADD REPLYlink written 2.7 years ago by skanwal20

One problem:

This needs to be a separate input of type File.

Alternatively you can follow the approach recommended in https://github.com/common-workflow-language/workflows/issues/78#issuecomment-217117284

  1. Remove explicit mention of the jar (Lines 195-197)

  2. Change the basecommand:

baseCommand: [ java, org.broadinstitute.sting.gatk.CommandLineGATK ]

  1. Set the CLASSPATH environment variable to the location of the installed jar file
hints:
 - class: DockerRequirement
   dockerPull: fzkhan/picard-1.136-gatk-2.8
 - class: EnvVarRequirement
   CLASSPATH: /home/biodocker/bin/GenomeAnalysisTK-2.8-1-g932cd3a/
ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by Michael R. Crusoe1.4k

I have incorporated these changes (updated the gist file with the changes I did) It's throwing the following error Tool definition failed validation: Validating hint EnvVarRequirement: missing required field envDef

The CLASSPATH is the path of the docker file or the locally installed jar file?

ADD REPLYlink written 2.7 years ago by skanwal20

Pardon me, that should have been

hints:
 - class: DockerRequirement
   dockerPull: fzkhan/picard-1.136-gatk-2.8
 - class: EnvVarRequirement
   envDef: 
     - envName: CLASSPATH
       envValue: /home/biodocker/bin/GenomeAnalysisTK-2.8-1-g932cd3a/

I got that path from your Docker image. If running locally you would set the CLASSPATH in your job input document:

#GATK-PrintReads.yaml
"cwl:requirements":
 - class: EnvVarRequirement
   envDef: 
     - envName: CLASSPATH
       envValue: /local/path/to/GATK-jar/
inputBam_printReads:
  class: File
  path: "../tools/indelRealigner-2016-03-07.bam"
reference:
  class: File
  path: "../tools/outputFiles/hg19.fasta"
input_baseRecalibrator:
  class: File
  path: "../tools/baseRecalibrator-2016-03-07.table"
outputfile_printReads: printReads-2016-25-07.bam
ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by Michael R. Crusoe1.4k

Hi.

I have tried with and without docker. It's complaining about the "envValue" field (the error below). Also I have updated the GIST repository with the new cwltool definition I am using.

/usr/local/bin/cwltool 1.0.20160701210210
Tool definition failed validation:
Validating hint `EnvVarRequirement`: could not validate field `envDef` because
  At position 0
    missing required field `envValue`
Traceback (most recent call last):
  File "build/bdist.macosx-10.11-intel/egg/cwltool/main.py", line 643, in main
    makeTool, {})
  File "build/bdist.macosx-10.11-intel/egg/cwltool/load_tool.py", line 163, in make_tool
    tool = makeTool(processobj, **kwargs)
  File "build/bdist.macosx-10.11-intel/egg/cwltool/workflow.py", line 32, in defaultMakeTool
    return draft2tool.CommandLineTool(toolpath_object, **kwargs)
  File "build/bdist.macosx-10.11-intel/egg/cwltool/draft2tool.py", line 117, in __init__
    super(CommandLineTool, self).__init__(toolpath_object, **kwargs)
  File "build/bdist.macosx-10.11-intel/egg/cwltool/process.py", line 337, in __init__
    self.validate_hints(kwargs["avsc_names"], self.tool.get("hints", []), strict=kwargs.get("strict"))
  File "build/bdist.macosx-10.11-intel/egg/cwltool/process.py", line 523, in validate_hints
    raise validate.ValidationException(u"Validating hint `%s`: %s" % (r["class"], str(v)))
ValidationException: Validating hint `EnvVarRequirement`: could not validate field `envDef` because
  At position 0
    missing required field `envValue`
ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by skanwal20

Okay, I fixed my code above (the - before envValue shouldn't have been there).

One should also be able to write that as:

hints:
 - class: DockerRequirement
   dockerPull: fzkhan/picard-1.136-gatk-2.8
 - class: EnvVarRequirement
   envDef: 
     CLASSPATH: /home/biodocker/bin/GenomeAnalysisTK-2.8-1-g932cd3a/

Using the new mapping shorthand in v1.0.

ADD REPLYlink written 2.7 years ago by Michael R. Crusoe1.4k

Hi Michael,

Thanks for your time. The previous error has been solved and a new one has popped up:

Error: Could not find or load main class org.broadinstitute.sting.gatk.CommandLineGATK
Error collecting output for parameter 'output_printReads'
Traceback (most recent call last):
  File "build/bdist.macosx-10.11-intel/egg/cwltool/draft2tool.py", line 330, in collect_output_ports
    ret[fragment] = self.collect_output(port, builder, outdir)
  File "build/bdist.macosx-10.11-intel/egg/cwltool/draft2tool.py", line 405, in collect_output
    raise WorkflowException("Did not find output file with glob pattern: '{}'".format(globpatterns))
WorkflowException: Did not find output file with glob pattern: '['printReads-2016-25-07.bam']'
Error while running job: Error collecting output for parameter 'output_printReads': Did not find output file with glob pattern: '['printReads-2016-25-07.bam']'
[job GATK-PrintReads-copy.cwl] completed permanentFail
[job GATK-PrintReads-copy.cwl] {}
Final process status is permanentFail

The command produced by the tool is:

docker \
    run \
    -i \
    '--volume=/Users/sehrish/Google Drive/Analysis using Galaxy, Cpipe and CWL/GATK-worflow/tools/outputFiles/hg19.fasta:/var/lib/cwl/stg26cdaa03/hg19.fasta:ro' \
    '--volume=/Users/sehrish/Google Drive/Analysis using Galaxy, Cpipe and CWL/GATK-worflow/tools/outputFiles/hg19.fasta.amb:/var/lib/cwl/stg26cdaa03/hg19.fasta.amb:ro' \
    '--volume=/Users/sehrish/Google Drive/Analysis using Galaxy, Cpipe and CWL/GATK-worflow/tools/outputFiles/hg19.fasta.ann:/var/lib/cwl/stg26cdaa03/hg19.fasta.ann:ro' \
    '--volume=/Users/sehrish/Google Drive/Analysis using Galaxy, Cpipe and CWL/GATK-worflow/tools/outputFiles/hg19.fasta.rbwt:/var/lib/cwl/stg26cdaa03/hg19.fasta.rbwt:ro' \
    '--volume=/Users/sehrish/Google Drive/Analysis using Galaxy, Cpipe and CWL/GATK-worflow/tools/outputFiles/hg19.fasta.pac:/var/lib/cwl/stg26cdaa03/hg19.fasta.pac:ro' \
    '--volume=/Users/sehrish/Google Drive/Analysis using Galaxy, Cpipe and CWL/GATK-worflow/tools/outputFiles/hg19.fasta.sa:/var/lib/cwl/stg26cdaa03/hg19.fasta.sa:ro' \
    '--volume=/Users/sehrish/Google Drive/Analysis using Galaxy, Cpipe and CWL/GATK-worflow/tools/indelRealigner-2016-03-07.bai:/var/lib/cwl/stg16e8042e/indelRealigner-2016-03-07.bai:ro' \
    '--volume=/Users/sehrish/Google Drive/Analysis using Galaxy, Cpipe and CWL/GATK-worflow/tools/outputFiles/hg19.fasta.fai:/var/lib/cwl/stg26cdaa03/hg19.fasta.fai:ro' \
    '--volume=/Users/sehrish/Google Drive/Analysis using Galaxy, Cpipe and CWL/GATK-worflow/tools/indelRealigner-2016-03-07.bam:/var/lib/cwl/stg16e8042e/indelRealigner-2016-03-07.bam:ro' \
    '--volume=/Users/sehrish/Google Drive/Analysis using Galaxy, Cpipe and CWL/GATK-worflow/tools/outputFiles/hg19.fasta.rsa:/var/lib/cwl/stg26cdaa03/hg19.fasta.rsa:ro' \
    '--volume=/Users/sehrish/Google Drive/Analysis using Galaxy, Cpipe and CWL/GATK-worflow/tools/outputFiles/hg19.fasta.bwt:/var/lib/cwl/stg26cdaa03/hg19.fasta.bwt:ro' \
    '--volume=/Users/sehrish/Google Drive/Analysis using Galaxy, Cpipe and CWL/GATK-worflow/tools/outputFiles/hg19.dict:/var/lib/cwl/stg26cdaa03/hg19.dict:ro' \
    '--volume=/Users/sehrish/Google Drive/Analysis using Galaxy, Cpipe and CWL/GATK-worflow/tools/outputFiles/hg19.fasta.rpac:/var/lib/cwl/stg26cdaa03/hg19.fasta.rpac:ro' \
    '--volume=/Users/sehrish/Google Drive/Analysis using Galaxy, Cpipe and CWL/GATK-worflow/tools/baseRecalibrator-2016-03-07.table:/var/lib/cwl/stg8f0f4f0/baseRecalibrator-2016-03-07.table:ro' \
    --volume=/var/folders/q6/2hgf2qtj1k5gw2vkbfxjn6br0000gp/T/tmpQqNEqX:/var/spool/cwl:rw \
    --volume=/var/folders/q6/2hgf2qtj1k5gw2vkbfxjn6br0000gp/T/tmpeQgLQV:/tmp:rw \
    --workdir=/var/spool/cwl \
    --rm \
    --env=TMPDIR=/tmp \
    --env=HOME=/var/spool/cwl \
    --env=PATH=/usr/local/bin/:/usr/bin:/bin \
    fzkhan/picard-1.136-gatk-2.8 \
    java \
    org.broadinstitute.sting.gatk.CommandLineGATK \
    -Xmx4g \
    -Djava.io.tmpdir=/tmp \
    -T \
    PrintReads \
    -R \
    /var/lib/cwl/stg26cdaa03/hg19.fasta \
    -I \
    /var/lib/cwl/stg16e8042e/indelRealigner-2016-03-07.bam \
    -BQSR \
    /var/lib/cwl/stg8f0f4f0/baseRecalibrator-2016-03-07.table \
    -o \
    printReads-2016-25-07.bam

I think the base command is causing a problem (not sure though). I have updated gist as well.

ADD REPLYlink written 2.7 years ago by skanwal20

I think the import of envar-global.yml is conflicting with the later hints, have you tried removing that?

ADD REPLYlink written 2.7 years ago by Michael R. Crusoe1.4k

Yes, I have tried removing the import but the error is exactly the same.

Error: Could not find or load main class org.broadinstitute.sting.gatk.CommandLineGATK
Error collecting output for parameter 'output_printReads'
ADD REPLYlink written 2.7 years ago by skanwal20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 683 users visited in the last hour