Searching wgs (whole genome shotgun) database with biopython
1
2
Entering edit mode
5.0 years ago

I used to be able to use BioPython to blast the wgs database at NCBI using the following handle:

NCBIWWW.qblast(program="blastn", database= "wgs" sequence=fasta_string, entrez_query= organism + '[ORGN]')

This does not appear to work anymore, and returns an xml with no results. For searches in wgs the taxon must be specified, and I have previously used the entrez_query parameter to do this. In the web interface this is not available and there is a "limit by" field which has to be filled in instead. Does anybody know how to specify the "limit by" parameter in BioPython or to specify other parameters that may be required for the wgs blast? It is a bit frustrating when scripts stop working because of changes in the ncbi interface.

Daniel

blast genome • 1.9k views
1
Entering edit mode
4.7 years ago
jbalberge ▴ 170

The list of WGS databases can be retrieved with taxids2wgs.pl: ftp://ftp.ncbi.nlm.nih.gov/blast/WGS_TOOLS

\$ perl taxid2wgs.pl -url_api_ready 487 > wgs_dbs_tax_487


It contains the WGS accesses

WGS_VDB://MTKJ01
WGS_VDB://MTKM01
WGS_VDB://LXLB01
WGS_VDB://LXLA01
WGS_VDB://MJAV01
WGS_VDB://NBTU01


And in biopython

fasta_string = open("myfasta.fa").read()

result_handle = NCBIWWW.qblast(program="tblastn",
database=db,
(...))



This is not convenient but gives you the hits:

APBS01000081
AEEF01000091
NBTU01000001
AEPJ01000146
LXLB01000009
LXLA01000006
MJAV01000001
CMWW01000023

0
Entering edit mode

IIUC, taxid2wgs doesn't report all wgs records. for example:

perl taxid2wgs.pl 90371 | grep AAJDPJ


would return nothing, even though https://www.ncbi.nlm.nih.gov/assembly/GCA_007903735.1 exists.