Question: blast running for several splitted query
0
gravatar for seta
3.4 years ago by
seta1.0k
Sweden
seta1.0k wrote:

Hi all,

I'm getting confused with a so basic issue. I have a large query (about 50 MB) that should be exposed to blastx. For simplifying, I split it into several files (say named x00, x01, x02, etc), now I'm not sure about the right command to run the blast job for these queries. Thanks for sharing your commands

rna-seq blast alignment • 1.2k views
ADD COMMENTlink modified 9 days ago by RamRS17k • written 3.4 years ago by seta1.0k
2
gravatar for RamRS
3.4 years ago by
RamRS17k
Houston, TX
RamRS17k wrote:

If you have access to an HPC, run an array job - that's the best way to get this done fastest.

Serial processing is to just use a loop:

for num in $(seq 1 10)
do
    blastx input_${num} database.blastdb >output.out
done

You can use GNU parallel or different script files to deal with each BLASTX run.

If you have access to HPC, look up job arrays - these supply an ARRAY_ID or some such iterator variable value to each job in the array, and you can then use this array id to control which input file is used by that job. The command in the HPC script would look like:

blastx input_${ARRAY_ID} database.blastdb >output.out

and the command to submit the script would include the range of the ARRAY_ID variable like so:

qsub -t 1-10 job.pbs #assuming your HPC used PBS

HTH

ADD COMMENTlink modified 10 days ago • written 3.4 years ago by RamRS17k

-num_threads would not work?

ADD REPLYlink modified 10 days ago by RamRS17k • written 3.4 years ago by geek_y8.7k

Threading the blast operation doesn't do much because most of the work is still run serially. It only parallelizes some of the overhead, on the assumption each thread will be disk IO bound. To force it and let the OS worry about disk IO, we manually run several instances. You can test it and see you get speedup until 2-4 processes are running, then they slow down regardless of CPU count.

ADD REPLYlink modified 10 days ago by RamRS17k • written 3.4 years ago by karl.stamm3.3k

Thanks so much for your prompt reply Ram. I'll try it. I heard from you that the speed of blastall for doing blastx for small query is much better than ncbi-blast+ in your experience. Could you please let us if you have even compered the results of two program for the same query file, they were identical or not?

ADD REPLYlink modified 10 days ago by RamRS17k • written 3.4 years ago by seta1.0k

Yes, I did experience that blastall for smaller query sequences was faster than blast+, but that was in 2013, it might not be a valid observation today - blast+ might have been optimized.

I am sorry - I did not compare the results. This was early in my HPC experience, so I was making a ton of mistakes and blast+ was taking too long per learning cycle. Also, I just wanted to get it done and wasn't looking to learn such nuanced matters, sorry.

ADD REPLYlink modified 10 days ago • written 3.4 years ago by RamRS17k

Thanks Ram.

 

ADD REPLYlink written 3.4 years ago by seta1.0k
0
gravatar for Antonio R. Franco
3.4 years ago by
Spain. Universidad de Córdoba
Antonio R. Franco3.8k wrote:

Try using Blast2Go as well. You can run different Blasts and get even more information, such as mapping, domain, etc

ADD COMMENTlink modified 10 days ago by RamRS17k • written 3.4 years ago by Antonio R. Franco3.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1546 users visited in the last hour