Question: What is the correct way to use GNU parallel with Primer3?
0
gravatar for lunchboxwu
4.7 years ago by
lunchboxwu20
Taiwan
lunchboxwu20 wrote:

Hi

I need to design primers for around 40,000 sequences. After doing this task with Primer3, I found that it took a very long time.

I I tried to accelerate primer3 operation with GNU parallel, but I cannot managed to successfully use GNU parallel to split input file and do multi-thread operation. Somehow primer3 still ran on 1 core only.

My command is as the following:

cat fasta.p3in | parallel --round-robin -j 12 --pipe --recend "=" /Tools/primer3/primer3-2.3.6/src/primer3_core > fasta.p3out

Could anyone tell the correct way to use GNU Parallel along with Primer3? Thanks a lot!

ADD COMMENTlink modified 4.7 years ago by ole.tange3.6k • written 4.7 years ago by lunchboxwu20
1

what would typical command line (without parallel) look like? would it be something like

/Tools/primer3/primer3-2.3.6/src/primer3_core fasta_part1.p3in

as a sidenote, have you looked through the parallel guide? Gnu Parallel - Parallelize Serial Command Line Programs Without Changing Them

ADD REPLYlink modified 4.7 years ago • written 4.7 years ago by Ying W3.9k

Hi, Ying W:

Thanks.
I've read through Gnu Parallel tutorial and the post Gnu Parallel - Parallelize Serial Command Line Programs Without Changing Them..
The command I used is according to the BLAT example in the biostar post.

The command line (without parallel) of primer3 is:

/Tools/primer3/primer3-2.3.6/src/primer3_core fasta.p3in > fasta.p3out

and the record in *.p3in (primer3 input format) is:

SEQUENCE_ID=1
SEQUENCE_TEMPLATE=ATATGGCGATAGTAAAATTTTGAAAAAAAAAAAGAAAAATTTTAGAAGCAAAATTTTCCGTCATCTTGAATTTTGAAAA
PRIMER_PRODUCT_SIZE_RANGE=100-280
SEQUENCE_TARGET=20,17
PRIMER_MAX_END_STABILITY=250
=
SEQUENCE_ID=2
SEQUENCE_TEMPLATE=TTAAATTTAACACAAAACTTTTTACCGTGTGGGAAAATTTCTAATAAACAGGATTTATCAGATTTATCAATTGCAAGAAAA
PRIMER_PRODUCT_SIZE_RANGE=100-280
SEQUENCE_TARGET=20,17
PRIMER_MAX_END_STABILITY=250
=

there's a '=' at the end of each record

any ideas?

 

ADD REPLYlink written 4.7 years ago by lunchboxwu20
4
gravatar for ole.tange
4.7 years ago by
ole.tange3.6k
Denmark
ole.tange3.6k wrote:

Your biggest mistake was probably that your records contain '=' on every line, but only '\n=\n' is a record separator. Using the command 'wc' or '--files cat' is great for debugging that kind of problems.

Your second mistake is that --block-size defaults to 1M: So the first instance may simply gobble up everything.

This ought to work (untested, as I have neither access to fasta.p3in nor to primer3):

cat fasta.p3in | parallel -N1 --round-robin --pipe --recend "\n=\n" --cat /Tools/primer3/primer3-2.3.6/src/primer3_core > fasta.p3out

You can possibly leave out --cat if primer3 reads from STDIN. If GNU Parallel takes up significant time, increase -N1: With 40000 records it is probably OK to split on bigger chunks than 1 record.

 

 

 

ADD COMMENTlink modified 4.7 years ago • written 4.7 years ago by ole.tange3.6k
2

Thank you for your help, ole.tange. You are my lifesaver!

You're right, I should use "\n=\n" as delimiter and I also should set record number for parallel.

Finally I managed to run primer3 with parallel. The command line is the following:

cat fasta.p3in | parallel -N10 --round-robin --pipe --recend "\n=\n" /Tools/primer3/primer3-2.3.6/src/primer3_core > fasta.p3out
ADD REPLYlink written 4.7 years ago by lunchboxwu20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2209 users visited in the last hour