Question: Getting error with awk when using parallel processing in bedtools
1
gravatar for nastaran.esfahani
20 days ago by
nastaran.esfahani10 wrote:

I have 44 .tsv files in one folder and I want to calculate the number of intersect of each pairwise with intersect command of bedtools tool. each output file would have 4 coloums and I just need to save only sum of value of coloumn 4 in each output file. I can do it easily when I do it by one one but when I use parallel processing to do the whole process at the same time I get syntax error

here is the code and result when I try each two pairs by one one manually:

$ bedtools intersect -a p1.tsv -b p2.tsv -c

chr1 1 5 1

chr1 8 12 1

chr1 18 20 1

chr1 21 25 0

bedtools intersect -a p1.tsv -b p2.tsv -c | awk '{sum+=$4} END {print sum}

3

here is the code and error when I am using parallel processing:

$ parallel "bedtools intersect -a {1} -b {2} -c |awk '{sum+=$4} END {print sum}'> {1}.{2}.intersect" ::: `ls *.tsv` ::: `ls *.tsv`

awk: cmd. line:1:{sum+=} END {print sum}
awk: cmd. line:1:            ^ syntax error
awk: cmd. line:1:{sum+=} END {print sum}
awk: cmd. line:1:            ^ syntax error
awk: cmd. line:1:{sum+=} END {print sum}
awk: cmd. line:1:            ^ syntax error
awk: cmd. line:1:{sum+=} END {print sum}
awk: cmd. line:1:            ^ syntax error
parallel intersect bedtools • 85 views
ADD COMMENTlink modified 20 days ago by ATpoint19k • written 20 days ago by nastaran.esfahani10

whith double quotes, $ is interpreted as a SHELL variable . It must be escaped: https://unix.stackexchange.com/questions/162476

ADD REPLYlink written 20 days ago by Pierre Lindenbaum121k

I found using GNU parallel and awk tricky because of having to escape quotes and such. How many cores do you have on your computer? If it is only two, then I would just go with a non-parallel bash for loop.

ADD REPLYlink written 20 days ago by jean.elbers1.1k
2
gravatar for ATpoint
20 days ago by
ATpoint19k
Germany
ATpoint19k wrote:

It is probably quoting that messes things. For simplicity it is better to write the part that you want to parallelize into a function and then parallelize with parallel:

function PL {

  ## Exit if input files are the same:
  if [[ $1 == $2 ]]; then exit; fi

  ## Intersect:
  bedtools intersect -a ${1} -b ${2} -c |awk '{sum+=$4} END {print sum}'> ${1%.*}.${2%.*}.intersect
}; export -f PL

parallel "PL" ::: *.tsv ::: *.tsv
ADD COMMENTlink modified 20 days ago • written 20 days ago by ATpoint19k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1961 users visited in the last hour