8.9 years ago by
The BLAST section of the BioPython module is not terribly well documented. The relevant section is here.
Before starting you'll need to create a local BLAST database. To my knowledge there is no way to do this directly through BioPython (but you could use the Subprocess module to automate the commandline if you really wanted too). The documentation for the BLAST tool is on the NCBI website here.
You'll obviously need to download the whole genome sequences. I suggest using the NCBI FTP site, I can never seem to find the right link in the normal webportal.
Once you have all of the relevant downloads and databases created you'll simply need to run the BLAST query in a loop that processes all of the data. Something like this should work (I don't have the required data to test this but it should get you 95% of the way there.)
from Bio.Blast.Applications import NcbiblastxCommandline
from Bio.Blast import NCBIXML
from Bio import SeqIO
SOURCE_FASTA = '/path/to/source/seq.fasta'
DATABASE = 'databsename' #should be in the path but YMMV
with open(SOURCE_FASTA) as inhandle:
for seq in SeqIO.parse(handle, 'fasta'):
with open('scratch.fasta', 'w') as outhandle:
#write a scratch file
SeqIO.write(seq, outhandle, 'fasta')
#create the commandline string
cline = NcbiblastxCommandline(query='scratch.fasta',
db=DATABASE, evalue=0.001, outfmt=5, out="scratch.xml")
#actually run BLAST
return_code = subprocess.call(str(cline))
if return_code == 0:
#if it was successful then process it
with open('scratch.xml') as xmlhandle:
blast_record = NCBIXML.read(xmlhandle)
Hope that helps.
modified 5 months ago
RamRS ♦ 20k
8.9 years ago by
Will ♦ 4.5k