Question: using parallel program
1
gravatar for saadleeshehreen
7 months ago by
saadleeshehreen40 wrote:

I have 10,000 genome.For analyzing each genome, the following software takes 2/3 minutes. I am using the following loop and I think will take ~ a month to analyze my data . I am looking forward a faster way. e.g using parallel. How to fit the loop in parallel? or any other suggestions?

cat fna.ls | while read i j; do
   mkdir -p ~/jobs_resfinder/${j%.*} 
   perl ~/res/resfinder.pl -d ~/res/resfinderdb -i ${i} -a all -k 90.00 -l 0.60 -o ~/jobs_resfinder/${j%.*}
done

Where, fna.ls = list of genomes

sequence • 359 views
ADD COMMENTlink modified 7 months ago by Pierre Lindenbaum113k • written 7 months ago by saadleeshehreen40

Paste out-put of cat fna.ls

ADD REPLYlink written 7 months ago by MSM5570

These is ~10,000 . I paste only 2

/Volumes/scratch/brownlab/chrisbr/DB/RefSeq86/bacteria/G/Geobacteraceae_bacterium_GWC2_53_11-1798316#GCA_001802645.1/GCA_001802645.1_ASM180264v1_genomic.fna    GCA_001802645.1_ASM180264v1_genomic.fna
/Volumes/scratch/brownlab/chrisbr/DB/RefSeq86/bacteria/G/Gammaproteobacteria_bacterium_REDSEA-S21_B8-1811667#GCA_001629445.1/GCA_001629445.1_ASM162944v1_genomic.fna    GCA_001629445.1_ASM162944v1_genomic.fna
ADD REPLYlink modified 7 months ago by Pierre Lindenbaum113k • written 7 months ago by saadleeshehreen40

reformat the post according to below post

ADD REPLYlink written 7 months ago by MSM5570

I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

In addition, I converted this thread to a "Question". "Tool" should only be used for announcing new tools.

ADD REPLYlink written 7 months ago by WouterDeCoster32k
0
gravatar for mohammadhassanj
7 months ago by
mohammadhassanj50 wrote:

I think the following links will be useful to you

https://www.cyberciti.biz/faq/how-to-run-command-or-code-in-parallel-in-bash-shell-under-linux-or-unix/

https://www.gnu.org/software/parallel/parallel_tutorial.html

ADD COMMENTlink written 7 months ago by mohammadhassanj50

Thanks. I have no coding background and struggle a lot with it. I googled a lot, but can't solve problem for this one. So, looking for expert solution !

ADD REPLYlink written 7 months ago by saadleeshehreen40
0
gravatar for 5heikki
7 months ago by
5heikki7.8k
Finland
5heikki7.8k wrote:

Assuming you have installed GNU parallel, something like this:

#!/bin/bash

THREADS="16"

function restFinderFunction() {
    i="$1"
    j="$2"
    mkdir -p ~/jobs_resfinder/${j%.*} 
    perl ~/res/resfinder.pl -d ~/res/resfinderdb -i ${i} -a all -k 90.00 -l 0.60 -o ~/jobs_resfinder/${j%.*}
}

export -f restFinderFunction
export THREADS

cat fna.ls | parallel -j "$THREADS" -n 2 restFinderFunction {}
#or parallel -j "$THREADS" -n 2 restFinderFunction {} <fna.ls

Like this

$cat file
1
2
3
4
5
6
7
8
9
10

$function joku(){ echo "arg 1:$1 arg2:$2"; }; export -f joku; cat file | parallel -j4 -n2 joku {}
arg 1:1 arg2:2
arg 1:3 arg2:4
arg 1:5 arg2:6
arg 1:7 arg2:8
arg 1:9 arg2:10
ADD COMMENTlink modified 7 months ago • written 7 months ago by 5heikki7.8k

Thanks a lot . But I am confused in one point . My fna.ls file is the list for $i and $j . So, is it right to declare like that? i="$1" j="$2"

I also tried like that. First, I nano my script in test.sh Then run following code. But still it takes same time. How to make it faster?

parallel  --eta -j 3 --load 80% -k 'bash test.sh'
ADD REPLYlink written 7 months ago by saadleeshehreen40

Because of parallel -n 2 restFinderFunction gets two args. To the function they're $1 and $2. You don't need to reassign them to i and j. You can use them directly as well. What goes for running the script, you simply save it, chmod +x and just execute it: ./script.sh ..don't call it with parallel

You can monitor stuff with e.g. htop. If IO is the bottle neck then running in parallel will do you little good..

ADD REPLYlink modified 7 months ago • written 7 months ago by 5heikki7.8k

Hi,

I tried your script. It can generate a directory but that is empty. And it also produces other directory named " Network". I can't figure out the reason.The main problem is it can't execute the Perl script. So, no output in the directory.

Any suggestion?

ADD REPLYlink written 7 months ago by saadleeshehreen40

If your data is in format:

arg1<tab>arg2
arg1<tab>arg2

You should actually change the tabs to newlines before piping to parallel, e.g.

cat fna.ls | tr "\t" "\n" | parallel ...

The script was written for data that was in format like below:

arg1
arg2
arg1
arg2
ADD REPLYlink written 7 months ago by 5heikki7.8k

thanks a lot . It works! :)

ADD REPLYlink written 6 months ago by saadleeshehreen40
0
gravatar for Pierre Lindenbaum
7 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum113k wrote:

using a Makefile (should work, I cannot test it without your data/software)

run it in parallel using the option -j <jobs> of make

make -j 16
ADD COMMENTlink written 7 months ago by Pierre Lindenbaum113k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1430 users visited in the last hour