Question: prokka for several files at once
gravatar for Bioinfosenhaji
12 months ago by
Bioinfosenhaji20 wrote:

I wanted to know how I can launch prokka for a folder that has several .fasta files of several genomes annotated thank you for the idea.

annotation • 837 views
ADD COMMENTlink modified 12 months ago by Mensur Dlakic9.1k • written 12 months ago by Bioinfosenhaji20

What have you tried? How does prokka take a FASTA file as an input argument? Can multiple files be provided? If one file is expected, can process substitution be used? You should ask and exhaust these questions yourself.

ADD REPLYlink written 12 months ago by Ram32k

Now I'm intrigued about how you intend to use process substitution for this...

ADD REPLYlink written 12 months ago by cschu1812.6k

If only one file is expected with a parameter, say -f, you can use -f <(cat file1 file2 file3), and that is how process substitution can be of value here. The point of my comment was to get OP to think and invest some effort on how to solve their problem.

ADD REPLYlink modified 12 months ago • written 12 months ago by Ram32k

Ok makes sense, but that would mix them, which may be reasonable or not, depending on what's in the files. Was just curious.

ADD REPLYlink written 12 months ago by cschu1812.6k
gravatar for Dave Carlson
12 months ago by
Dave Carlson520
Stony Brook University, NY
Dave Carlson520 wrote:

I'm assuming that you want to annotate each fasta file separately. If that's correct, then you should be able to do this relatively easily with gnu parallel.

According to the github page, the simplest prokka usage is just:

prokka <inpute fa file>

Therefore, if you want to run prokka on several input fasta files simultaneously, you could do this with gnu parallel. For example (assuming the fasta files are in your current working directory):

ls *.fasta | parallel --verbose "prokka {} --prefix {.}_out"

In the above command, each fasta file name is piped to parallel, which will launch a a separate prokka analysis for each of those fasta files. The output file names will be based on the input fasta file names, with the ".fasta" extension removed. The "--verbose" flag will print the prokka command for each input fasta file to the screen, which makes it easier to understand what exactly is going on.

Note that I have not tested the above command, so you might consider adding the "--dry-run" flag. This will print out the commands to be run without actually running them.

You can find many great gnu parallel examples here.

Of course, if you didn't actually want to annotate these genomes separately, then the above approach will not be what you want.

ADD COMMENTlink written 12 months ago by Dave Carlson520
gravatar for Mensur Dlakic
12 months ago by
Mensur Dlakic9.1k
Mensur Dlakic9.1k wrote:

A similar topic was discussed couple of days ago - see here.

ADD COMMENTlink written 12 months ago by Mensur Dlakic9.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1534 users visited in the last hour