Question: How can I run a Biopython qblast search targeting a specific set of organisms?
0
gravatar for sviatoslav.kendall
6 days ago by
United States
sviatoslav.kendall770 wrote:

I want to use Biopython's qblast() command to query NCBI's databases but I want to limit my search to specific organisms. Looking at the documentation for this command, I can guess that the "entrez_query" parameter might be helpful but I have not been able to find any information about what sort of value it expects. I tried providing a taxid value which did not work:

result_handle = NCBIWWW.qblast('blastp', 'nr', record.seq, entrez_query='(3702)')
result_handle = NCBIWWW.qblast('blastp', 'nr', record.seq, entrez_query='(taxid=3702)')
result_handle = NCBIWWW.qblast('blastp', 'nr', record.seq, entrez_query='(taxid:3702)')

Each time I got a error message explaining that this was an invalid entrez query but I'm not sure what a valid entrez query looks like. I have also tried looking for examples of this option being used but have not found any.

Any and all help would be appreciated.

blast biopython • 61 views
ADD COMMENTlink modified 5 days ago by zorbax100 • written 6 days ago by sviatoslav.kendall770
1
gravatar for zorbax
5 days ago by
zorbax100
Mexico
zorbax100 wrote:

You need to use txid3702[ORGN], I tested this with the Calvin cycle protein CP12-2:

import urllib.request
from Bio import SeqIO
from Bio.Blast import NCBIWWW

url = 'https://www.uniprot.org/uniprot/Q9LZP9.fasta'
urllib.request.urlretrieve(url, "chain_N.faa")

record = SeqIO.read("chain_N.faa", format="fasta")
result_handle = NCBIWWW.qblast('blastp', 'nr', record.seq, 
                               entrez_query="txid3702[ORGN]")
ADD COMMENTlink modified 5 days ago • written 5 days ago by zorbax100

Thank you so much! Any chance you know how to specify multiple organisms in a single query? Or how to specify that certain taxonomic groups should be excluded? Ultimately, I'm going to want to search within taxonomic groups while excluding specific species...

ADD REPLYlink written 5 days ago by sviatoslav.kendall770
1

You can use boolean search, something like "all [filter] NOT(environmental samples[organism] OR metagenomes[orgn]) AND txid3702[ORGN] AND txid9606[ORGN]" This example exclude environmental/metagenomic sampels and include human in the search.

ADD REPLYlink modified 5 days ago • written 5 days ago by zorbax100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1173 users visited in the last hour