Handle directories downloaded from NCBI for database creation
2
0
Entering edit mode
9.8 years ago

Hello,

I want to create a viral database on my computer. I downloaded all the complete viral genomes from ncbi's ftp, the all.fna.tar.gz.

I uncompressed it, and it made thousands of directories, each containing the fasta file(s) for one virus.

I would like to concatenate all these files into one before using formatdb, but I don't know how to do, in python or bash, to go through all the directories and all the files and write all of them in a new one (with cat or something else).

Thank you,

genome bash database python • 2.5k views
ADD COMMENT
0
Entering edit mode
9.8 years ago

makeblastdb can read from stdin:

find /path/to/dir -type f -name "*.fa" -exec cat '{}' ';' |\
path/to/ncbi/ncbi-blast-2.2.28+/bin/makeblastdb  -dbtype nucl  -in -  -out dbName -title dbName
ADD COMMENT
0
Entering edit mode

Thank you! However, when I try this it says that cat ended its execution with signal 13, do you know what it means?

ADD REPLY
0
Entering edit mode
9.7 years ago
5heikki 11k
tar zxvf all.fna.tar.gz --strip=2

Should extract all the files to the wd.

ADD COMMENT

Login before adding your answer.

Traffic: 1700 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6