Question: Calling Python Script From Bash Console While Using Gnu Parallel
gravatar for NPalopoli
8.0 years ago by
Argentina/Buenos Aires/Universidad Nacional de Quilmes & Fundación Instituto Leloir
NPalopoli280 wrote:

I am trying to run a python script which processes many FASTA files in parallel using GNU parallel 20110722. As it could be seen below, I am not able to run in any of the ways I tried. (^C marks the point where I interrupt the job with ctrl+C because there is no response from the system).

me@kubuntu:~/Programs/LeitMotifsParallel$ parallel python {1} :::: <(echo
me@kubuntu:~/Programs/LeitMotifsParallel$ parallel python {1} ::: <(echo
File "/dev/fd/63", line 1
SyntaxError: invalid syntax
me@kubuntu:~/Programs/LeitMotifsParallel$ parallel
parallel: Input is tty. Press CTRL-D to exit.

However, the python script runs as expected when run directly from console.

/home/me/Programs/LeitMotifsParallel/ DeprecationWarning: the sets module is deprecated
import sets #@UnusedImport
Start : 18:59:34 11Aug2011

Though I have watched the two tutorials on GNU Parallel in youtube and have gone through the many examples of the README file, I haven't been succesful in finding an answer for this situation, so I would really appreciate if you could help me solve this.

python parallel bash • 8.8k views
ADD COMMENTlink modified 8.0 years ago by tange190 • written 8.0 years ago by NPalopoli280

Note that you can also use threading really easily from within Python:

ADD REPLYlink written 8.0 years ago by Michael Schubert6.9k

Not sure this is a bioinformatics question. What are the ":::" doing? did you try "parallel python --"

ADD REPLYlink written 8.0 years ago by brentp23k

Is it possible/necessary to put the python call into a separate bash file (e.g. It might be a problem with the input/output redirection or piping...

ADD REPLYlink written 8.0 years ago by Cjt370

Thanks all for the suggestions. Though I don't think it could be strictly categorized as a bioinformatics question I assumed it may be relevant for the community.
brentp: The ":::" are used by GNU Parallel for specifying arguments from the command line. I tried with your option but the "Input from tty" line shows it is not useful.
cjt: I tried with calling python from a separate bash file and using parallel to call that file but it is not working either.
Michael Schubert: I would definitely use threading from within Python in other cases but for this job I need to use parallel.

ADD REPLYlink written 8.0 years ago by NPalopoli280
gravatar for tange
8.0 years ago by
tange190 wrote:

First it is good to see a fellow bioinformatician use GNU Parallel.

Secondly it is good to learn you have watched the 4 intro videos: and that you have browsed through the examples:

GNU Parallel does not parallelize the internals of an existing program. What it can do, however, is call the same program with different arguments. So if you normally do: foo.fasta bar.fasta

you can parallelize this by running 2 in parallel like this:

parallel ::: foo.fasta bar.fasta

If your program has the filenames hardcoded in the program then GNU Parallel cannot parallelize the task for you. The filenames must be given on the command line.

Alternatively, if the program reads from standard input (stdin) and you normally do:

cat foo.fasta bar.fasta |

then you can parallelize this by using GNU Parallel --pipe:

cat foo.fasta bar.fasta | parallel --pipe --block 10M --recstart '>'

This will chop the input into 10 MB sized chunks and pass them on to on standard input (stdin). Each chunk will be chopped at a '>' which is where a FASTA record starts.

If what you normally do is:

that is, run a bunch of different programs that happen to have the arguments hardcoded in each of them, then you can do:

parallel ::: MainMult.*.py


ls | grep MainMult | parallel

However, from your description of the Python script it seems you normally do:

(only one program with no arguments and no reading from standard input). In this situation GNU Parallel is unable to parallelize the task. My advice is to change the Python script so that it takes the filenames as its arguments from the command line.

ADD COMMENTlink modified 8.0 years ago • written 8.0 years ago by tange190

I have seen the files and read the examples but I couldn't make my script work.

The FASTA files are directly called in the file in the following way: 1) A variable is defined pointing to the file: work_file = "/home/me/Prog/Seqs.fasta" 2) The filename is specified as one of many parameters in the proper function call: Run("Seqs.fasta", 7, 100, True, 0.63, True, False, 0.98, False, False)

The point here is that the runs from command line, but not if called with parallel.

ADD REPLYlink written 8.0 years ago by NPalopoli280

@NPalopoli: Not sure I understand; are you iterating over lists of FASTA files and function arguments within the script itself?

ADD REPLYlink written 8.0 years ago by James Estevez90

The key is that (and I cite you), "GNU Parallel does not parallelize the internals of an existing program". I've made different python scripts for each Fasta file and managed to run the program by following your advice and calling: parallel ::: MainMult.*.py. Thanks a lot for your help!

ADD REPLYlink written 8.0 years ago by NPalopoli280
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1604 users visited in the last hour