Question: Blast-XML and Biopython: extract query-anchored results (i.e. MSA)
gravatar for LeWöps
2.4 years ago by
LeWöps0 wrote:

Dear all,

I have some trouble getting blast (command-line) and biopython work together. I want a pipeline that runs a single protein query against a blast database, and returns a multiple sequence alignment of hits into python, where I can do further analysis on it.

In tblastn, I can choose the output - my desired format would be "3 = flat query-anchored, show identities", which returns a multiple sequence alignment. Unfortunately, this is, as far as I understand, not really compatible with biopython.

Biopython instead accepts e.g. a blast-xml file as input, which can be selected as an output from tblastn. However, from an xml, in biopython I am only able to extract pairwise alignments - the MSA seems to be lost. The Bio.Blast.record class does even have a multiple_alignment method, but it returns 'None' all the time.

I hope my problem is understandable - does anyone have experience in how to get the 'query-anchored' output from blast into (bio)python?

ADD COMMENTlink modified 2.4 years ago by Peter5.8k • written 2.4 years ago by LeWöps0
gravatar for Peter
2.4 years ago by
Scotland, UK
Peter5.8k wrote:

There is indeed a .multiple_alignment attribute in the BLAST record object, and there is code in Bio.Blast.NCBIStandalone (the plain text BLAST parser) which should populate it. There is an example of sorts in one test within Tests/ which might be useful. As a self contained example based on that, try something like this:

from Bio.Blast.NCBIStandalone import BlastParser
from Bio.Alphabet import IUPAC

parser = BlastParser()
with open("Blast/text_2010L_blastp_006.txt") as handle:
    record = parser.parse(handle)

generic_align = record.multiple_alignment.to_generic(IUPAC.protein)
test_seq = generic_align[0].seq
assert test_seq.alphabet == IUPAC.protein
assert str(test_seq[:60]) == record.multiple_alignment.alignment[0][2]

It might be nice to add this to Bio.AlignIO but I personally have never used the BLAST text MSA output however, and I don't think it works within Bio.SearchIO.

ADD COMMENTlink written 2.4 years ago by Peter5.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1621 users visited in the last hour