Getting error with awk when using parallel processing in bedtools
1
1
Entering edit mode
4.8 years ago

I have 44 .tsv files in one folder and I want to calculate the number of intersect of each pairwise with intersect command of bedtools tool. each output file would have 4 coloums and I just need to save only sum of value of coloumn 4 in each output file. I can do it easily when I do it by one one but when I use parallel processing to do the whole process at the same time I get syntax error

here is the code and result when I try each two pairs by one one manually:

$ bedtools intersect -a p1.tsv -b p2.tsv -c

chr1 1 5 1

chr1 8 12 1

chr1 18 20 1

chr1 21 25 0

bedtools intersect -a p1.tsv -b p2.tsv -c | awk '{sum+=$4} END {print sum}

3

here is the code and error when I am using parallel processing:

$ parallel "bedtools intersect -a {1} -b {2} -c |awk '{sum+=$4} END {print sum}'> {1}.{2}.intersect" ::: `ls *.tsv` ::: `ls *.tsv`

awk: cmd. line:1:{sum+=} END {print sum}
awk: cmd. line:1:            ^ syntax error
awk: cmd. line:1:{sum+=} END {print sum}
awk: cmd. line:1:            ^ syntax error
awk: cmd. line:1:{sum+=} END {print sum}
awk: cmd. line:1:            ^ syntax error
awk: cmd. line:1:{sum+=} END {print sum}
awk: cmd. line:1:            ^ syntax error
bedtools intersect parallel • 2.1k views
ADD COMMENT
0
Entering edit mode

whith double quotes, $ is interpreted as a SHELL variable . It must be escaped: https://unix.stackexchange.com/questions/162476

ADD REPLY
0
Entering edit mode

I found using GNU parallel and awk tricky because of having to escape quotes and such. How many cores do you have on your computer? If it is only two, then I would just go with a non-parallel bash for loop.

ADD REPLY
4
Entering edit mode
4.8 years ago
ATpoint 81k

It is probably quoting that messes things. For simplicity it is better to write the part that you want to parallelize into a function and then parallelize with parallel:

function PL {

  ## Exit if input files are the same:
  if [[ $1 == $2 ]]; then exit; fi

  ## Intersect:
  bedtools intersect -a ${1} -b ${2} -c |awk '{sum+=$4} END {print sum}'> ${1%.*}.${2%.*}.intersect
}; export -f PL

parallel "PL" ::: *.tsv ::: *.tsv
ADD COMMENT

Login before adding your answer.

Traffic: 1503 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6