Question: interesting behavior with grep
2
gravatar for ionox0
26 days ago by
ionox0370
ionox0370 wrote:

This isn't a question as much as an interesting finding, to get grep to work with the -v flag when the entire file is grepped-out, you need to add || true to make sure the jobs doesn't fail due to a nonzero exit code from grep.

For example this tool concatenates several vcf files together, but will fail withouth the || true if grep removes the only line from a vcf that only has the header (no variants):

cwlVersion: v1.0

class: CommandLineTool

requirements:
  - class: InlineJavascriptRequirement
  - class: ShellCommandRequirement

arguments:
- head
- -n
- '1'
- $(inputs.vcfs[0].path)

- shellQuote: false
  valueFrom: '>'

- all_calls.vcf

- shellQuote: false
  valueFrom: '&&'

- cat
- $(inputs.vcfs)

- shellQuote: false
  valueFrom: '|'

- grep
- -vP
- "^chr1"

# Need this to prevent nonzero exit code if grep runs on header only
- shellQuote: false
  valueFrom: '||'
- 'true'

- shellQuote: false
  valueFrom: '>>'

- all_calls.vcf

inputs:

  vcfs: File[]

outputs:

  concatenated_vcf:
    type: File
    outputBinding:
      glob: all_calls.vcf
cwl • 123 views
ADD COMMENTlink modified 26 days ago • written 26 days ago by ionox0370
2

From the grep manual page

Exit Status: 0 if a line is selected, 1 if no lines were selected, and 2 if an error occurred

So we can use successCodes: [0, 1] to document that with || true which could hide an error

Also, does

- cat
- $(inputs.vcfs)

really work when vcfs is type: File[]?

ADD REPLYlink modified 23 days ago • written 23 days ago by Michael R. Crusoe1.5k

Thanks for the tip, I didn't consider using this feature, indeed the successCodes feature solves this problem more cleanly.

The result of the cwl is the following:

$ /bin/sh \
-c \
'head' '-n' '1' '/scratch/tmpeZmeI2/stg3e5bd4d3-f4b3-40c3-be05-689bb0bcd8cf/Sample_1_Annotated_Evidence-annotated.txt' > 'all_calls.txt' && 'cat' '/scratch/tmpeZmeI2/stg3e5bd4d3-f4b3-40c3-be05-689bb0bcd8cf/Sample_1_Annotated_Evidence-annotated.txt' '/scratch/tmpeZmeI2/stg2676fb6f-d8ed-4ced-86e1-8011f675ed83/Sample_2_Annotated_Evidence-annotated.txt' | 'grep' '-vP' '^TumorId' || 'true' >> 'all_calls.txt'

Which looks correct to me in terms of the multiple files being supplied to cat. Is this not recommended?

However I've realized another issue which is that the second command after && is not being redirected to the all_calls.txt file but is rather still being output to stdout. Perhaps I'm misunderstanding the /bin/sh -c usage, but using a subshell for the second command seems to work, although I'm not sure it's recommended:

arguments:
- head
- -n
- '1'
- $(inputs.sv_calls[0].path)

- shellQuote: false
  valueFrom: '>'

- all_calls.txt

- shellQuote: false
  valueFrom: '&&'

# Need to use subshell in order to gather stdout from second command to append to file
- shellQuote: false
  valueFrom: '('

- cat
- $(inputs.sv_calls)

- shellQuote: false
  valueFrom: '|'

- grep
- -vP
- "^TumorId"

# Need this to prevent nonzero exit code if grep runs on header only
- shellQuote: false
  valueFrom: '||'
- 'true'

# Need to use subshell in order to gather stdout from second command to append to file
- shellQuote: false
  valueFrom: ')'

- shellQuote: false
  valueFrom: '>>'

- all_calls.txt
ADD REPLYlink modified 19 days ago • written 19 days ago by ionox0370
1

You're in a situation where I would either recommend using a bash script or splitting into multiple CommandLineTools

I take back my comment about $(inputs.vcfs), I was thinking of something else :-)

ADD REPLYlink written 17 days ago by Michael R. Crusoe1.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 971 users visited in the last hour