Question: GNU parallel error: Command line too long
1
gravatar for FGV
11 days ago by
FGV90
FGV90 wrote:

Dear all,

I've been using GNU parallel for a while and it works quite well. However, I recently needed to run some very long commands and parallel complained that the command line was too long:

parallel: Error: Command line too long (223235 >= 131049) at input 0: cat /tmp/10110507/adadev34gv_213...

It is a bit weird since my shell seems supports commands longer than 131049:

$ getconf ARG_MAX
2621440

and had no trouble running a very long command:

$ perl -e 'print("true "."x"x10000000);' | bash
$ echo $?
0

Does anyone know why it is so low and/or how to change it? thanks,

gnu parallel • 180 views
ADD COMMENTlink modified 7 days ago by ole.tange3.1k • written 11 days ago by FGV90

Is this bioinformatics related?

ADD REPLYlink written 11 days ago by goodez160

Well, it might not seem at first sight, but I am actually trying to concatenate several FASTA files and align them with MAFFT. :)

ADD REPLYlink written 11 days ago by FGV90

That's sufficient to be bioinformatics related, but since we are a "focussed" forum it would be best if you mention that application in your initial question to remove all doubt.

ADD REPLYlink written 11 days ago by WouterDeCoster29k

Can print the output of parallel --max-line-length-allowed

ADD REPLYlink written 11 days ago by microfuge900

Sure, it matches with the error message:

$ parallel --max-line-length-allowed
131049
ADD REPLYlink written 11 days ago by FGV90

Can you build your command (or parts of it) programmatically, and echo that as a string to awk '{ print length($0) }' or wc -c etc.? This may help track down which command (or which parts) are too long for parallel.

ADD REPLYlink modified 11 days ago • written 11 days ago by Alex Reynolds24k

As I said on another post, I actually have a script with all the commands that I pipe to parallel, and it is actually the first line that has 223188 characters. Also noticed that there are other lines that won't work either:

1020419
391943
223188
150854
146331

Strangely, I extracted the first command, turned it into an echo and piped it to bash, and it worked fine.... :/

ADD REPLYlink written 10 days ago by FGV90
5
gravatar for ole.tange
7 days ago by
ole.tange3.1k
Denmark
ole.tange3.1k wrote:

GNU Parallel pessimistically assumes all characters have to be quoted. For this reason the max line length is half of what you would otherwise expect.

I have a file with the commands to run (several thousand) and I pipe it to parallel. It seems one of these commands is way too big...

A command line > 10000 chars - even a generated one - is highly unusual. GNU Parallel normally only hit that limit when copying a big environment (using env_parallel).

Try this to identify the long lines:

grep -E '.{100000}' file_with_commands

If they cannot be written shorter, then you can use this workaround: Give each line on stdin to bash one by one:

cat file_with_commands | parallel --pipe -N1 bash

The biggest disadvantage is that --joblog will not make sense, but if you do not use that, then this solution should be OK.

ADD COMMENTlink modified 4 days ago • written 7 days ago by ole.tange3.1k

But even if GNU parallel assumes quotes, the maximum argument length is still quite low. According to getconf, I should be able to use 2'621'440 characters (see post above). Why is GNU parallel limit 20 times lower than that?

It seems I have 5 commands with length greater than 100'000 characters. Is there any way to increase or disable this check? thanks,

ADD REPLYlink written 6 days ago by FGV90

The problem is in execve, which has the 128KB limit. In other words: It is not the same limit as you see in getconf ARG_MAX.

ADD REPLYlink written 6 days ago by ole.tange3.1k

OK, does that mean that there is no way to increase the execve limit?

What about making GNU parallel more optimistic (and not assume all characters have to be quoted)? :) Would it be possible to have an option for this?

thanks,

ADD REPLYlink written 6 days ago by FGV90

I have found no way to increase the execve limit.

ADD REPLYlink written 6 days ago by ole.tange3.1k

What about making it more optimistic? :)

ADD REPLYlink written 5 days ago by FGV90

Is the command itself too large, or is it the list of arguments/files that you're passing to it that is exceeding the limit?

ADD REPLYlink written 5 days ago by jrj.healey4.6k

It is the list of arguments that is too large.

ADD REPLYlink written 5 days ago by FGV90

Can you chunk your file list using split or similar? Or is it required for all of the arguments to be in that command?

ADD REPLYlink written 5 days ago by jrj.healey4.6k

Well, the command is basically a cat of several files and then piped into MAFFT. I guess I could split the cat into several cats, and pipe it at the end.. but that is a bit error-prone and I'd like to avid it if possible.

ADD REPLYlink written 5 days ago by FGV90

Why not cat all the files beforehand, and pass the file either directly to MAFFT, or via STDIN (if you have your heart set on piping)?

A workaround for cating more files than the commandline can handle would be to build up a list of the files using find and then -exec, then simply tell it to append the files in the list. You can probably do this with xargs too if you want parallelisation of some form.

ADD REPLYlink modified 5 days ago • written 5 days ago by jrj.healey4.6k

But it is exactly the cat that breaks the limit because I am doing it on several thousand files. I guess I could do the cat directly on the terminal (no parallel), and then use parallel to run all the alignments since these are the time intensive steps...

ADD REPLYlink written 5 days ago by FGV90

Yeah so your problem is not with parallel, it's with the Unix cli limit, so you need to be a little cleverer about how you're doing it.

Besides, concatenating 10,000 files single line files, is the same as concatenating 10 x 1000 line files.

I would use find to build up the list and do the concatenation so that you have a single file ready to go, if you don't want to do the chunking of files manually:

https://unix.stackexchange.com/questions/76418/concatenating-thousands-of-files-vs

ADD REPLYlink written 5 days ago by jrj.healey4.6k

Hmmm, I think it is a parallel issue (or rather execve), since I can run the commands directly on the terminal.

From what I understood, parallel uses execve to run the commands, and that has a much smaller buffer (apparently 20x smaller) than the terminal limit (seen as getconf ARG_MAX).

ADD REPLYlink written 5 days ago by FGV90

Perhaps you're right, but I think my point still stands. I think to expect a significant change in how parallel handles CLI args is wishful thinking (especially for something that is a little bit of an edge case), so you'd be better off coming up with a robust way to get around this. There are loads and loads of threads about the fastest/best way of concatenating large numbers of files etc, so I really would strongly advise you to just rethink your process before you get as far as parallel.

ADD REPLYlink written 5 days ago by jrj.healey4.6k
0
gravatar for WouterDeCoster
11 days ago by
Belgium
WouterDeCoster29k wrote:

First solution that comes to my mind is to put parts of it in a bash script, e.g do_stuff.sh

INPUT=$1
VARIABLE=$2
OUTPUT=$3
command_1  $INPUT | command_2 | command_3 $VARIABLE > $OUTPUT

and then use that script with parallel:

ls *.fastq | parallel -j 8 'do_stuff.sh {} foobar {.}_output

Don't know if that fits what you are doing

ADD COMMENTlink written 11 days ago by WouterDeCoster29k

That is actually what I am doing... I have a file with the commands to run (several thousand) and I pipe it to parallel. It seems one of these commands is way too big...

ADD REPLYlink written 11 days ago by FGV90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 954 users visited in the last hour