Question: Output FASTA from Biopython blast result
0
gravatar for traviata
15 months ago by
traviata10
traviata10 wrote:

I'm trying to use the Biopython wrapper for blastp to download matching protein sequences for some sequences that I have stored on my computer. I would like these matching sequences in FASTA format, similar to how on the web server one can select all sequences producing significant alignment and download "FASTA (aligned sequences)". This was my attempt:

from Bio.Blast import NCBIWWW
from Bio.Blast import NCBIXML

base_dir = "/Users/kjsdhjasv/Documents/cs2017/coevo/"
inputs = ['rnapBeta', 'S3', 'S4', 'S5']
for input in inputs:
    fasta_string = open(base_dir + 'data/e_coli_k12/' + input + '.fa').read()
    out = base_dir + 'results/' + input + '.fa'
    with NCBIWWW.qblast("blastp", "nr", fasta_string) as result_handle:
        with open(out, 'w') as out_file:
            blast_record = NCBIXML.read(result_handle)
            for alignment in blast_record.alignments:
                for hsp in alignment.hsps:
                    out_file.write(alignment.title)
                    out_file.write(hsp.sbjct)

How can I extract a fasta from the blast result?

bio.blast blastp biopython fasta • 1.2k views
ADD COMMENTlink modified 15 months ago • written 15 months ago by traviata10
1
gravatar for traviata
15 months ago by
traviata10
traviata10 wrote:

I believe I found a solution to my own question. The easiest way to do this seems to through SearchIO instead of NCBIXML. Here is the code I ended up using:

from Bio.Blast import NCBIWWW
from Bio import SearchIO
from Bio import SeqIO

base_dir = "/Users/kjsdhjasv/Documents/cs2017/coevo/"
inputs = ['rnapBeta', 'S3', 'S4', 'S5']

for input in inputs:
    blast_xml = base_dir + 'data/blast_xml/' + input + '.xml'
    blast_out = base_dir + 'data/blast_out/' + input + '.fa'

    # run BLAST
    fasta_string = open(base_dir + 'data/e_coli_k12/' + input + '.fa').read()
    with NCBIWWW.qblast("blastp", "nr", fasta_string) as result_handle:
        with open(blast_xml, 'w') as xml_file:
            xml_file.write(result_handle.read())

    # parse xml and write to fasta
    blast_qresult = SearchIO.read(blast_xml, 'blast-xml')
    records = []
    for hit in blast_qresult:
        records.append(hit[0].hit)
    SeqIO.write(records, blast_out, "fasta")
ADD COMMENTlink written 15 months ago by traviata10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1996 users visited in the last hour