Question: How to perform BLAST against segmented database
0
gravatar for User 6777
23 months ago by
User 677710
United States
User 677710 wrote:

Hi all,

Sorry for this long question, but I have facing this issue due to my hardware limitations(I am using windows 7 machine (32 bit) with 4 gb of ram).

I have a random number (and with random name) of .fa files within a folder named 'seq', each of which containing only a single fasta protein sequence, as:

NP_4500.1.fa
NP_4568.1.fa
NP_45981.3.fa
XM_we679.fa
36498746.fa

in another folder named 'db', I made a database fragmented in 200 segments (due to my computational limitations) which are arranged as:

hg.part-001.db
hg.part-002.db
hg.part-003.db
..
..
hg.part-200.db

now I want to run usearch of each sequence against the fragmented database and generate fragmented result, as for one fa file (NP_4500.1.fa):

usearch -ublast ./seq/NP_4500.1.fa -db ./db/hg.part-001.db -evalue 1e-10 -accel 0.5 -blast6out NP_4500.1_part-001.out
usearch -ublast ./seq/NP_4500.1.fa -db ./db/hg.part-002.db -evalue 1e-10 -accel 0.5 -blast6out NP_4500.1_part-002.out
usearch -ublast ./seq/NP_4500.1.fa -db ./db/hg.part-003.db -evalue 1e-10 -accel 0.5 -blast6out NP_4500.1_part-003.out
...
...
usearch -ublast ./seq/NP_4500.1.fa -db hg.part-00200.db -evalue 1e-10 -accel 0.5 -blast6out NP_4500.1_part-00200.out

After that, I want to merge the results in a single file as:

join NP_4500.1_part-001.out NP_4500.1_part-002.out .. NP_4500.1_part-00200.out > NP_4500.1.out

similarly for next seq:

NP_4568.1.fa

...

Now, I can run a cmd script for each fasta fike as:

for %%F in ("*.fa") do usearch -ublast ./seq/%%F .......

But my question is, how can I integrate this command with each of the fragmented database and merge the .out files to generate result for a single sequence before proceeding to the next.

I can use cmd, perl or python script. Thanks for ur consideration.

cmd python perl • 645 views
ADD COMMENTlink written 23 months ago by User 677710

Apart from the original problem that should be solvable by a batch script, I would consider to simplify your life. I propose you can spare yourself a lot of hassle by upgrading to a better computer. A few aspects that make your setting much more difficult than it had to be:

  • Windows and cmd, cmd is not specifically powerful or easy to use for scripting (when compared to bash)
  • using usearch free 32 bit version (instead of NCBI blast+), which requires the split of databases, I do not understand fully why you have to split the db really or what you are trying to search, are the data so big? I think with NCBI blast you don't need much more than 4GB of RAM for a blast search against even NR.
ADD REPLYlink modified 23 months ago • written 23 months ago by Michael Dondrup44k

Thanks for reply.. I'll upgrade my machine soon, but for now i need to split the db as -makedb in 32 bit usearch cant handle my uniref database (20 gb). And I am avoiding ncbi-blast simply because it is too slow for my requirement (vs ublast)

ADD REPLYlink written 23 months ago by User 677710
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 693 users visited in the last hour