Question

using parallel program

1

Entering edit mode

6.1 years ago

saadleeshehreen ▴ 140

I have 10,000 genome.For analyzing each genome, the following software takes 2/3 minutes. I am using the following loop and I think will take ~ a month to analyze my data . I am looking forward a faster way. e.g using parallel. How to fit the loop in parallel? or any other suggestions?

cat fna.ls | while read i j; do
   mkdir -p ~/jobs_resfinder/${j%.*} 
   perl ~/res/resfinder.pl -d ~/res/resfinderdb -i ${i} -a all -k 90.00 -l 0.60 -o ~/jobs_resfinder/${j%.*}
done

Where, fna.ls = list of genomes

sequence • 1.9k views

ADD COMMENT • link updated 6.1 years ago by Pierre Lindenbaum 161k • written 6.1 years ago by saadleeshehreen ▴ 140

0

Entering edit mode

Paste out-put of cat fna.ls

ADD REPLY • link 6.1 years ago by MSM55 ▴ 160

0

Entering edit mode

These is ~10,000 . I paste only 2

/Volumes/scratch/brownlab/chrisbr/DB/RefSeq86/bacteria/G/Geobacteraceae_bacterium_GWC2_53_11-1798316#GCA_001802645.1/GCA_001802645.1_ASM180264v1_genomic.fna    GCA_001802645.1_ASM180264v1_genomic.fna
/Volumes/scratch/brownlab/chrisbr/DB/RefSeq86/bacteria/G/Gammaproteobacteria_bacterium_REDSEA-S21_B8-1811667#GCA_001629445.1/GCA_001629445.1_ASM162944v1_genomic.fna    GCA_001629445.1_ASM162944v1_genomic.fna

ADD REPLY • link updated 6.1 years ago by Pierre Lindenbaum 161k • written 6.1 years ago by saadleeshehreen ▴ 140

0

Entering edit mode

reformat the post according to below post

ADD REPLY • link 6.1 years ago by MSM55 ▴ 160

0

Entering edit mode

I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

In addition, I converted this thread to a "Question". "Tool" should only be used for announcing new tools.

ADD REPLY • link 6.1 years ago by WouterDeCoster 47k

score 0 · Answer 1 · 2018-03-22

0

Entering edit mode

6.1 years ago

mohammadhassanj ▴ 260

I think the following links will be useful to you

https://www.cyberciti.biz/faq/how-to-run-command-or-code-in-parallel-in-bash-shell-under-linux-or-unix/

https://www.gnu.org/software/parallel/parallel_tutorial.html

ADD COMMENT • link 6.1 years ago by mohammadhassanj ▴ 260

0

Entering edit mode

Thanks. I have no coding background and struggle a lot with it. I googled a lot, but can't solve problem for this one. So, looking for expert solution !

ADD REPLY • link 6.1 years ago by saadleeshehreen ▴ 140

score 0 · Answer 2 · 2018-03-22

0

Entering edit mode

6.1 years ago

5heikki 11k

Assuming you have installed GNU parallel, something like this:

#!/bin/bash

THREADS="16"

function restFinderFunction() {
    i="$1"
    j="$2"
    mkdir -p ~/jobs_resfinder/${j%.*} 
    perl ~/res/resfinder.pl -d ~/res/resfinderdb -i ${i} -a all -k 90.00 -l 0.60 -o ~/jobs_resfinder/${j%.*}
}

export -f restFinderFunction
export THREADS

cat fna.ls | parallel -j "$THREADS" -n 2 restFinderFunction {}
#or parallel -j "$THREADS" -n 2 restFinderFunction {} <fna.ls

Like this

$cat file
1
2
3
4
5
6
7
8
9
10

$function joku(){ echo "arg 1:$1 arg2:$2"; }; export -f joku; cat file | parallel -j4 -n2 joku {}
arg 1:1 arg2:2
arg 1:3 arg2:4
arg 1:5 arg2:6
arg 1:7 arg2:8
arg 1:9 arg2:10

ADD COMMENT • link 6.1 years ago by 5heikki 11k

0

Entering edit mode

Thanks a lot . But I am confused in one point . My fna.ls file is the list for $i and $j . So, is it right to declare like that? i="$1" j="$2"

I also tried like that. First, I nano my script in test.sh Then run following code. But still it takes same time. How to make it faster?

parallel  --eta -j 3 --load 80% -k 'bash test.sh'

ADD REPLY • link 6.1 years ago by saadleeshehreen ▴ 140

0

Entering edit mode

Because of parallel -n 2 restFinderFunction gets two args. To the function they're $1 and $2. You don't need to reassign them to i and j. You can use them directly as well. What goes for running the script, you simply save it, chmod +x and just execute it: ./script.sh ..don't call it with parallel

You can monitor stuff with e.g. htop. If IO is the bottle neck then running in parallel will do you little good..

ADD REPLY • link 6.1 years ago by 5heikki 11k

0

Entering edit mode

Hi,

I tried your script. It can generate a directory but that is empty. And it also produces other directory named " Network". I can't figure out the reason.The main problem is it can't execute the Perl script. So, no output in the directory.

Any suggestion?

ADD REPLY • link 6.1 years ago by saadleeshehreen ▴ 140

0

Entering edit mode

If your data is in format:

arg1<tab>arg2
arg1<tab>arg2

You should actually change the tabs to newlines before piping to parallel, e.g.

cat fna.ls | tr "\t" "\n" | parallel ...

The script was written for data that was in format like below:

arg1
arg2
arg1
arg2

ADD REPLY • link 6.1 years ago by 5heikki 11k

0

Entering edit mode

thanks a lot . It works! :)

ADD REPLY • link 6.1 years ago by saadleeshehreen ▴ 140

score 0 · Answer 3 · 2018-03-22

0

Entering edit mode

6.1 years ago

Pierre Lindenbaum 161k

using a Makefile (should work, I cannot test it without your data/software)

run it in parallel using the option -j <jobs> of make

make -j 16

ADD COMMENT • link 6.1 years ago by Pierre Lindenbaum 161k