Question: How can I run a Biopython qblast search targeting a specific set of organisms?
gravatar for sviatoslav.kendall
6 days ago by
United States
sviatoslav.kendall770 wrote:

I want to use Biopython's qblast() command to query NCBI's databases but I want to limit my search to specific organisms. Looking at the documentation for this command, I can guess that the "entrez_query" parameter might be helpful but I have not been able to find any information about what sort of value it expects. I tried providing a taxid value which did not work:

result_handle = NCBIWWW.qblast('blastp', 'nr', record.seq, entrez_query='(3702)')
result_handle = NCBIWWW.qblast('blastp', 'nr', record.seq, entrez_query='(taxid=3702)')
result_handle = NCBIWWW.qblast('blastp', 'nr', record.seq, entrez_query='(taxid:3702)')

Each time I got a error message explaining that this was an invalid entrez query but I'm not sure what a valid entrez query looks like. I have also tried looking for examples of this option being used but have not found any.

Any and all help would be appreciated.

blast biopython • 61 views
ADD COMMENTlink modified 5 days ago by zorbax100 • written 6 days ago by sviatoslav.kendall770
gravatar for zorbax
5 days ago by
zorbax100 wrote:

You need to use txid3702[ORGN], I tested this with the Calvin cycle protein CP12-2:

import urllib.request
from Bio import SeqIO
from Bio.Blast import NCBIWWW

url = ''
urllib.request.urlretrieve(url, "chain_N.faa")

record ="chain_N.faa", format="fasta")
result_handle = NCBIWWW.qblast('blastp', 'nr', record.seq, 
ADD COMMENTlink modified 5 days ago • written 5 days ago by zorbax100

Thank you so much! Any chance you know how to specify multiple organisms in a single query? Or how to specify that certain taxonomic groups should be excluded? Ultimately, I'm going to want to search within taxonomic groups while excluding specific species...

ADD REPLYlink written 5 days ago by sviatoslav.kendall770

You can use boolean search, something like "all [filter] NOT(environmental samples[organism] OR metagenomes[orgn]) AND txid3702[ORGN] AND txid9606[ORGN]" This example exclude environmental/metagenomic sampels and include human in the search.

ADD REPLYlink modified 5 days ago • written 5 days ago by zorbax100
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1173 users visited in the last hour