Question: using parallel program
1
gravatar for saadleeshehreen
10 months ago by
saadleeshehreen60 wrote:

I have 10,000 genome.For analyzing each genome, the following software takes 2/3 minutes. I am using the following loop and I think will take ~ a month to analyze my data . I am looking forward a faster way. e.g using parallel. How to fit the loop in parallel? or any other suggestions?

cat fna.ls | while read i j; do
   mkdir -p ~/jobs_resfinder/${j%.*} 
   perl ~/res/resfinder.pl -d ~/res/resfinderdb -i ${i} -a all -k 90.00 -l 0.60 -o ~/jobs_resfinder/${j%.*}
done

Where, fna.ls = list of genomes

sequence • 435 views
ADD COMMENTlink modified 10 months ago by Pierre Lindenbaum116k • written 10 months ago by saadleeshehreen60

Paste out-put of cat fna.ls

ADD REPLYlink written 10 months ago by MSM5580

These is ~10,000 . I paste only 2

/Volumes/scratch/brownlab/chrisbr/DB/RefSeq86/bacteria/G/Geobacteraceae_bacterium_GWC2_53_11-1798316#GCA_001802645.1/GCA_001802645.1_ASM180264v1_genomic.fna    GCA_001802645.1_ASM180264v1_genomic.fna
/Volumes/scratch/brownlab/chrisbr/DB/RefSeq86/bacteria/G/Gammaproteobacteria_bacterium_REDSEA-S21_B8-1811667#GCA_001629445.1/GCA_001629445.1_ASM162944v1_genomic.fna    GCA_001629445.1_ASM162944v1_genomic.fna
ADD REPLYlink modified 10 months ago by Pierre Lindenbaum116k • written 10 months ago by saadleeshehreen60

reformat the post according to below post

ADD REPLYlink written 10 months ago by MSM5580

I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

In addition, I converted this thread to a "Question". "Tool" should only be used for announcing new tools.

ADD REPLYlink written 10 months ago by WouterDeCoster35k
0
gravatar for mohammadhassanj
10 months ago by
mohammadhassanj50 wrote:

I think the following links will be useful to you

https://www.cyberciti.biz/faq/how-to-run-command-or-code-in-parallel-in-bash-shell-under-linux-or-unix/

https://www.gnu.org/software/parallel/parallel_tutorial.html

ADD COMMENTlink written 10 months ago by mohammadhassanj50

Thanks. I have no coding background and struggle a lot with it. I googled a lot, but can't solve problem for this one. So, looking for expert solution !

ADD REPLYlink written 10 months ago by saadleeshehreen60
0
gravatar for 5heikki
10 months ago by
5heikki8.0k
Finland
5heikki8.0k wrote:

Assuming you have installed GNU parallel, something like this:

#!/bin/bash

THREADS="16"

function restFinderFunction() {
    i="$1"
    j="$2"
    mkdir -p ~/jobs_resfinder/${j%.*} 
    perl ~/res/resfinder.pl -d ~/res/resfinderdb -i ${i} -a all -k 90.00 -l 0.60 -o ~/jobs_resfinder/${j%.*}
}

export -f restFinderFunction
export THREADS

cat fna.ls | parallel -j "$THREADS" -n 2 restFinderFunction {}
#or parallel -j "$THREADS" -n 2 restFinderFunction {} <fna.ls

Like this

$cat file
1
2
3
4
5
6
7
8
9
10

$function joku(){ echo "arg 1:$1 arg2:$2"; }; export -f joku; cat file | parallel -j4 -n2 joku {}
arg 1:1 arg2:2
arg 1:3 arg2:4
arg 1:5 arg2:6
arg 1:7 arg2:8
arg 1:9 arg2:10
ADD COMMENTlink modified 10 months ago • written 10 months ago by 5heikki8.0k

Thanks a lot . But I am confused in one point . My fna.ls file is the list for $i and $j . So, is it right to declare like that? i="$1" j="$2"

I also tried like that. First, I nano my script in test.sh Then run following code. But still it takes same time. How to make it faster?

parallel  --eta -j 3 --load 80% -k 'bash test.sh'
ADD REPLYlink written 10 months ago by saadleeshehreen60

Because of parallel -n 2 restFinderFunction gets two args. To the function they're $1 and $2. You don't need to reassign them to i and j. You can use them directly as well. What goes for running the script, you simply save it, chmod +x and just execute it: ./script.sh ..don't call it with parallel

You can monitor stuff with e.g. htop. If IO is the bottle neck then running in parallel will do you little good..

ADD REPLYlink modified 10 months ago • written 10 months ago by 5heikki8.0k

Hi,

I tried your script. It can generate a directory but that is empty. And it also produces other directory named " Network". I can't figure out the reason.The main problem is it can't execute the Perl script. So, no output in the directory.

Any suggestion?

ADD REPLYlink written 10 months ago by saadleeshehreen60

If your data is in format:

arg1<tab>arg2
arg1<tab>arg2

You should actually change the tabs to newlines before piping to parallel, e.g.

cat fna.ls | tr "\t" "\n" | parallel ...

The script was written for data that was in format like below:

arg1
arg2
arg1
arg2
ADD REPLYlink written 10 months ago by 5heikki8.0k

thanks a lot . It works! :)

ADD REPLYlink written 10 months ago by saadleeshehreen60
0
gravatar for Pierre Lindenbaum
10 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum116k wrote:

using a Makefile (should work, I cannot test it without your data/software)

run it in parallel using the option -j <jobs> of make

make -j 16
ADD COMMENTlink written 10 months ago by Pierre Lindenbaum116k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1690 users visited in the last hour