Question: Use filename found in an input Directory in a CommandLineTool
0
gravatar for alanh
8 months ago by
alanh10
alanh10 wrote:

We're writing CWL's for a couple of different tools that want a FASTA reference, but refer to them in different ways. For example, STAR wants a --genomeDir as the input and the tool searches for the files it expects in that directory, but BWA and Picard want the actual fasta filename as input, and then looks for the associated indices derived from the filename.

I presume that I can use Javascript or some other glob method on the input directory to search for *.fasta *.fa and insert the filenames, but I don't know how one would do that.

Currently I can handle the BWA/Picard cases with something like this (Picard example):

  - id: reference_fasta
    type: File
    secondaryFiles:  [".amb",".ann",".bwt",".fai",".pac",".sa"]
    inputBinding: 
      position: 1
      prefix: R=
      separate: false

And the STAR case:

- id: genomeDir
    type: Directory
    inputBinding:
      position: 1
      prefix: "--genomeDir"

I'd like to be able to use the genomeDir parameter in both cases and have it search for something like *.fasta or *.fa and dump that into the input lines.

E.g, the directory listing for a hypothetical "ref.fa" might have something like this:

 ref.fa # BASE fasta
 ref.fa.amb ref.fa.ann ref.fa.bwt ref.fa.fai ref.fa.pac ref.fa.sa # BWA/Samtools indices
 SA SAindex Genome # STAR indices
 ref.dict  # Picard .dict
 ref.genome # Bedtools Genome file

We would have a separate directory for each genome (e.g. mm10, hg38, etc), but we could just dump the directory name into the CWL whether the tool called for the fasta or the path to the directory.

- id: genomeDir
    type: Directory
    inputBinding: 
      position: 1
      prefix: R=
      separate: false 
      valueFrom: |
            ${
                 return # MAGIC SEARCH FOR *.fa(sta) happens here
              }

(Pretty new to CWL here and haven't found an example workflow that includes this kind of expansion.)

cwl • 407 views
ADD COMMENTlink modified 8 months ago by bogdan.gavrilovic190 • written 8 months ago by alanh10
3
gravatar for bogdan.gavrilovic
8 months ago by
bogdan.gavrilovic190 wrote:

Hi, The Directory input has a listing property which you can get in JavaScript expression with inputs.genomeDir.listing. This returns a list of all file and directory objects contained inside that directory. This objects have all the same properties as regular file objects (path, basename...)

So for example in your case, you can get the .fa file path with something like this

${
    file_list = inputs.genomeDir.listing  
    for(i in file_list){
        if (file_list[i].path.endsWith('.fa'))
        return file_list[i].path
    } 
}
ADD COMMENTlink written 8 months ago by bogdan.gavrilovic190
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1754 users visited in the last hour