Question: Submitting several BLAST queries using NCBIWWW at once
0
gravatar for pawlowac
3.5 years ago by
pawlowac60
Canada
pawlowac60 wrote:

Hi everyone,

 

I am running blasp through NCBIWWW in biopython and I need to blast 50-100 sequences at a time. Right now, I am just going through the list one by one. I would like to submit several of these at once.

 

Is there a way to do this?

blast biopython ncbiwww • 1.1k views
ADD COMMENTlink modified 3.5 years ago by skbrimer500 • written 3.5 years ago by pawlowac60
2
gravatar for skbrimer
3.5 years ago by
skbrimer500
United States
skbrimer500 wrote:

Hi pawlowac,

I had a vary similar question a couple months ago ( Using Biopython and BLAST+ to automate de novo viral contig sorting ) and what Peter says in it is true. The short answer is you do not need to use biopython, you can just use the standalone blast function and use your file that has you sequence in it as the query. It works with any amount it will just take some time.

ADD COMMENTlink written 3.5 years ago by skbrimer500

Thanks for the Answer. I had first used BLAST+ to do this, but kept getting timeout errors. I tried again after your suggestion (with the exact same command) and it works great now. Must be the NCBI connection being unreliable as always.

ADD REPLYlink written 3.5 years ago by pawlowac60

Great, I'm glad it worked for you. :)

ADD REPLYlink written 3.5 years ago by skbrimer500

Ok, I take it back. It worked once, but now it says CPU limit exceeded. There was 150 proteins I was trying to blast...

ADD REPLYlink written 3.5 years ago by pawlowac60

You can limit the amount of results in the search parameters by using the 'max_target_seqs' flag. I think the manual has the default set like 500 or something sure high. If you only need a few close hits you can run it with a determined number. For mine I was only concerend with the most exact match so I run it with

-max_target_seq 1 
ADD REPLYlink written 3.5 years ago by skbrimer500

Unfortunately I need the diversity and there is significant overlap in results between the sequences. I end up parsing the XML results using biopython and grabbing sequence ID with certain conditions and then check for duplicates before using efetch to grab FASTA files. Oh well, back to the drawing board.

ADD REPLYlink written 3.5 years ago by pawlowac60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1799 users visited in the last hour