Question

How to perform BLAST against segmented database

0

Entering edit mode

7.7 years ago

User 6777 ▴ 20

Hi all,

Sorry for this long question, but I have facing this issue due to my hardware limitations(I am using windows 7 machine (32 bit) with 4 gb of ram).

I have a random number (and with random name) of .fa files within a folder named 'seq', each of which containing only a single fasta protein sequence, as:

NP_4500.1.fa
NP_4568.1.fa
NP_45981.3.fa
XM_we679.fa
36498746.fa

in another folder named 'db', I made a database fragmented in 200 segments (due to my computational limitations) which are arranged as:

hg.part-001.db
hg.part-002.db
hg.part-003.db
..
..
hg.part-200.db

now I want to run usearch of each sequence against the fragmented database and generate fragmented result, as for one fa file (NP_4500.1.fa):

usearch -ublast ./seq/NP_4500.1.fa -db ./db/hg.part-001.db -evalue 1e-10 -accel 0.5 -blast6out NP_4500.1_part-001.out
usearch -ublast ./seq/NP_4500.1.fa -db ./db/hg.part-002.db -evalue 1e-10 -accel 0.5 -blast6out NP_4500.1_part-002.out
usearch -ublast ./seq/NP_4500.1.fa -db ./db/hg.part-003.db -evalue 1e-10 -accel 0.5 -blast6out NP_4500.1_part-003.out
...
...
usearch -ublast ./seq/NP_4500.1.fa -db hg.part-00200.db -evalue 1e-10 -accel 0.5 -blast6out NP_4500.1_part-00200.out

After that, I want to merge the results in a single file as:

join NP_4500.1_part-001.out NP_4500.1_part-002.out .. NP_4500.1_part-00200.out > NP_4500.1.out

similarly for next seq:

NP_4568.1.fa

...

Now, I can run a cmd script for each fasta fike as:

for %%F in ("*.fa") do usearch -ublast ./seq/%%F .......

But my question is, how can I integrate this command with each of the fragmented database and merge the .out files to generate result for a single sequence before proceeding to the next.

I can use cmd, perl or python script. Thanks for ur consideration.

cmd perl python • 2.0k views

ADD COMMENT • link 7.7 years ago by User 6777 ▴ 20

0

Entering edit mode

Apart from the original problem that should be solvable by a batch script, I would consider to simplify your life. I propose you can spare yourself a lot of hassle by upgrading to a better computer. A few aspects that make your setting much more difficult than it had to be:

Windows and cmd, cmd is not specifically powerful or easy to use for scripting (when compared to bash)
using usearch free 32 bit version (instead of NCBI blast+), which requires the split of databases, I do not understand fully why you have to split the db really or what you are trying to search, are the data so big? I think with NCBI blast you don't need much more than 4GB of RAM for a blast search against even NR.

ADD REPLY • link 7.7 years ago by Michael 54k

0

Entering edit mode

Thanks for reply.. I'll upgrade my machine soon, but for now i need to split the db as -makedb in 32 bit usearch cant handle my uniref database (20 gb). And I am avoiding ncbi-blast simply because it is too slow for my requirement (vs ublast)

ADD REPLY • link 7.7 years ago by User 6777 ▴ 20