Blast Formatdb, Multiple Folders/Directories
3
0
Entering edit mode
12.9 years ago
Me_In_Oz ▴ 30

I have downloaded viral genomes from the NCBI website I want to use formatdb so I can create a BLAST database of these viral genomes. However, they are present as nearly 300 folders, each containing the fasta files for each of the genomes/genome segments (named by NC_ ids). In the documentation for formatdb, it says to format multiple files I must quote the input files to be formatted. I do not have a list of these files as they are within the folders. Is there any way to formatdb using multiple folders/directories rather than the individual fasta files?

Any help much appreciated :-) Thanks

blast • 4.6k views
ADD COMMENT
2
Entering edit mode
12.9 years ago

If you're using Linux:

cat `find /path/to/sequence/root -name "*.fasta"` > all_sequences.fasta
formatdb .... -i all_sequences.fasta (or makeblastdb)

If you're using Windows it's probably time to switch ;-)

ADD COMMENT
0
Entering edit mode

Tell me if you get an "argument too long" error, then the command needs a slight modification.

ADD REPLY
1
Entering edit mode
12.9 years ago
Jan Kosinski ★ 1.6k

or:

formatdb -i "`find ./ -iname '*.fasta' | perl -p -e "s/\n/ /"`" -n all.db
ADD COMMENT
0
Entering edit mode

While doing it all in one command is nice it makes it more difficult to diagnose errors ;-)

ADD REPLY
0
Entering edit mode

I agree completely ;-)

ADD REPLY
0
Entering edit mode
12.9 years ago
Tigtogs • 0

Thanks for the responses, I think I've sorted it now :-)

ADD COMMENT
0
Entering edit mode

If one of the answers solved your problem please accept it by clicking on the checkmark next to it.

ADD REPLY

Login before adding your answer.

Traffic: 2021 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6