How can I run a Biopython qblast search targeting a specific set of organisms?
1
0
Entering edit mode
3.9 years ago

I want to use Biopython's qblast() command to query NCBI's databases but I want to limit my search to specific organisms. Looking at the documentation for this command, I can guess that the "entrez_query" parameter might be helpful but I have not been able to find any information about what sort of value it expects. I tried providing a taxid value which did not work:

result_handle = NCBIWWW.qblast('blastp', 'nr', record.seq, entrez_query='(3702)')
result_handle = NCBIWWW.qblast('blastp', 'nr', record.seq, entrez_query='(taxid=3702)')
result_handle = NCBIWWW.qblast('blastp', 'nr', record.seq, entrez_query='(taxid:3702)')

Each time I got a error message explaining that this was an invalid entrez query but I'm not sure what a valid entrez query looks like. I have also tried looking for examples of this option being used but have not found any.

Any and all help would be appreciated.

BLAST Biopython • 4.3k views
ADD COMMENT
3
Entering edit mode
3.9 years ago
zorbax ▴ 610

You need to use txid3702[ORGN], I tested this with the Calvin cycle protein CP12-2:

import urllib.request
from Bio import SeqIO
from Bio.Blast import NCBIWWW

url = 'https://www.uniprot.org/uniprot/Q9LZP9.fasta'
urllib.request.urlretrieve(url, "chain_N.faa")

record = SeqIO.read("chain_N.faa", format="fasta")
result_handle = NCBIWWW.qblast('blastp', 'nr', record.seq, 
                               entrez_query="txid3702[ORGN]")
ADD COMMENT
0
Entering edit mode

Thank you so much! Any chance you know how to specify multiple organisms in a single query? Or how to specify that certain taxonomic groups should be excluded? Ultimately, I'm going to want to search within taxonomic groups while excluding specific species...

ADD REPLY
1
Entering edit mode

You can use boolean search, something like "all [filter] NOT(environmental samples[organism] OR metagenomes[orgn]) AND txid3702[ORGN] AND txid9606[ORGN]" This example exclude environmental/metagenomic sampels and include human in the search.

ADD REPLY
0
Entering edit mode

Hi, I have tried AND as well as and to target a set of organisms but my blast output only contains the last txid[ORGN] instead of showing the results for all the organisms. Is there a way to maybe loop over multiple taxid organisms?

ADD REPLY

Login before adding your answer.

Traffic: 3164 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6