Question: How To Parse Psiblast Results Using Biopython And Blast-2.2.24+?
4
gravatar for Niek De Klein
8.6 years ago by
Niek De Klein2.5k
Netherlands
Niek De Klein2.5k wrote:

I'm trying to run a PSIBlast program which selects certain sequences out at every round before it does the next iteration. For this I need the Round attribute shown in http://biopython.org/DIST/docs/tutorial/Tutorial.html#fig:psiblastrecord.<br< a="">> The biopython tutorial says: "In Biopython, the parsers return Record objects, either Blast or PSIBlast depending on what you are parsing." However, I can only access attributes found in the normal Blast record class: http://biopython.org/DIST/docs/tutorial/Tutorial.html#fig:blastrecord.

from Bio.Blast.Applications import NcbipsiblastCommandline from Bio import SeqIO from Bio.Blast import NCBIXML

File = "KNATM"

def psiBlast(File):
    fastaFile = open("BLAST-"+File+".txt","r")

    my_blast_db = r"C:\Niek\Test2.2.17\TAIR9_pep_20090619.fasta"
    my_blast_file = '"C:\\Niek\\Evolution MiP\\BLAST-'+File+'.txt"'
    my_blast_exe = r"C:\Niek\blast-2.2.24+\bin\psiblast.exe"    
    E_VALUE_TRESH = "10"

    for seq_record in SeqIO.parse("BLAST-"+File+".txt", "fasta"):
        global cline
        tempFile = open("tempFile.fasta","w")
        tempFile.write(">"+strseq_record.id)+"\n"+str(seq_record.seq)+"\n")
        tempFile.close()
        cline = NcbipsiblastCommandline(cmd = my_blast_exe, db = my_blast_db, \
                query = "tempFile.fasta", evalue = E_VALUE_TRESH, outfmt = 5, \
                out = "1stIte"+File+".xml", out_pssm = "pssm"+File)
        cline()
        result_handle = open("1stIte"+File+".xml","r")

        blast_record = NCBIXML.read(result_handle)
        for alignment in blast_record.alignments:
            for alignment in blast_record.alignments:
                print alignment.title
                for hsp in alignment.hsps:
                    print hsp.expect

psiBlast(File)

So this works, but if I change

blast_record = NCBIXML.read(result_handle)
for Round in blast_record.rounds:
    etc...

I get

Traceback (most recent call last):
  File "C:\Niek\Evolution MiP\psiBLAST1.1.py", line 45, in <module>
    psiBlast(File)
  File "C:\Niek\Evolution MiP\psiBLAST1.1.py", line 36, in psiBlast
    for Round in blast_record.rounds:
AttributeError: Blast instance has no attribute 'rounds

So how do you have to blast/parse to get the PSIBLAST record?

Thanks, Niek

python biopython blast • 4.2k views
ADD COMMENTlink modified 8.5 years ago by Pierre Lindenbaum119k • written 8.6 years ago by Niek De Klein2.5k

What version of BioPython are you using? I think the psiblast_record object is newer than NcbipsiblastCommandline itself.

ADD REPLYlink written 8.5 years ago by Michael Schubert6.9k
3
gravatar for Pierre Lindenbaum
8.6 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum119k wrote:

Not for python and not for PSI-BLAST but my answer might help: instead of trying to parse your XML with Python, how about parsing it with XML+XSLT ? For example, I wrote the stylesheet below to generate an input for mongodb from BLAST:


<xsl:stylesheet xmlns:xsl="&lt;a href=" <a="" href="http://www.w3.org/1999/XSL/Transform" rel="nofollow">http://www.w3.org/1999/XSL/Transform" "="" rel="nofollow">http://www.w3.org/1999/XSL/Transform'
    version='1.0'
    >

<xsl:output method="text"/>

<xsl:param name="database">blastdb</xsl:param>

<xsl:template match="/">
<xsl:apply-templates select="BlastOutput"/>
</xsl:template>

<xsl:template match="BlastOutput">
use <xsl:value-of select="$database"/>;

query={
      def:<xsl:apply-templates select="BlastOutput_query-def" mode="text"/>,
      len:<xsl:apply-templates select="BlastOutput_query-len"/>
      <xsl:apply-templates select="BlastOutput_param/Parameters"/>
      };
db.queries.save(query);
<xsl:apply-templates select="BlastOutput_iterations/Iteration/Iteration_hits/Hit"/>
</xsl:template>

<xsl:template match="Parameters">
,param: {
    expect:<xsl:apply-templates select="Parameters_expect"/>,
    sc_match:<xsl:apply-templates select="Parameters_sc-match"/>,
    sc_mismatch:<xsl:apply-templates select="Parameters_sc-mismatch"/>,
    gap_open:<xsl:apply-templates select="Parameters_gap-open"/>,
    gap_extend:<xsl:apply-templates select="Parameters_gap-extend"/>,
    filter:<xsl:apply-templates select="Parameters_filter" mode="text"/>,
    }
</xsl:template>

<xsl:template match="Hit">
hit={
query_id: query._id,
num:<xsl:apply-templates select="Hit_num"/>,
<xsl:apply-templates select="Hit_id"/>
def:<xsl:apply-templates select="Hit_def" mode="text"/>,
acn:<xsl:apply-templates select="Hit_accession" mode="text"/>,
len: <xsl:apply-templates select="Hit_len"/>,
hsp:[<xsl:for-each select="Hit_hsps/Hsp">
<xsl:if test="position()!=1">,</xsl:if>
<xsl:apply-templates select="."/>
</xsl:for-each>
]
};
db.hits.save(hit);
</xsl:template>

<xsl:template match="Hsp">
{
num:<xsl:apply-templates select="Hsp_num"/>,
bit_score:<xsl:apply-templates select="Hsp_bit-score"/>,
score:<xsl:apply-templates select="Hsp_score"/>,
evalue:<xsl:apply-templates select="Hsp_evalue"/>,
query_from:<xsl:apply-templates select="Hsp_query-from"/>,
query_to:<xsl:apply-templates select="Hsp_query-to"/>,
hit_from:<xsl:apply-templates select="Hsp_hit-from"/>,
hit_to:<xsl:apply-templates select="Hsp_hit-to"/>,
query_frame:<xsl:apply-templates select="Hsp_query-frame"/>,
hit_frame:<xsl:apply-templates select="Hsp_hit-frame"/>,
identity:<xsl:apply-templates select="Hsp_identity"/>,
positive:<xsl:apply-templates select="Hsp_positive"/>,
gaps:<xsl:apply-templates select="Hsp_gaps"/>,
align_len:<xsl:apply-templates select="Hsp_align-len"/>,
qseq:<xsl:apply-templates select="Hsp_qseq" mode="text"/>,
midline:<xsl:apply-templates select="Hsp_midline" mode="text"/>,
hseq:<xsl:apply-templates select="Hsp_hseq" mode="text"/>
}
</xsl:template>

<xsl:template match="Hit_id">
id:<xsl:apply-templates select="." mode="text"/>,
<xsl:if test="starts-with(.,'gi|')">
<xsl:variable name="gi" select="substring-before(substring-after(.,'gi|'),'|')"/>
gi:<xsl:value-of select="$gi"/>,
</xsl:if>
</xsl:template>

<xsl:template match="*" mode="text">
<xsl:text>"</xsl:text>
<xsl:call-template name="escape">
<xsl:with-param name="s" select="."/>
</xsl:call-template>
<xsl:text>"</xsl:text>
</xsl:template>

<xsl:template name="escape">
<xsl:param name="s"/>
  <xsl:choose>
        <xsl:when test="contains($s,'" ')"="">
                <xsl:value-of select="substring-before($s,'" ')"=""/>
                <xsl:text>\"</xsl:text>
                <xsl:call-template name="escape">
                        <xsl:with-param name="s" select="substring-after($s,'" ')"=""/>
                </xsl:call-template>
        </xsl:when>
        <xsl:otherwise>
                <xsl:value-of select="$s"/>
        </xsl:otherwise>
  </xsl:choose>
</xsl:template>

</xsl:stylesheet>

Result:

 xsltproc --novalid stylesheet.xsl ~/blast.xml

use blastdb;

query={
      def:"No definition line",
      len:670
,param: {
        expect:10,
        sc_match:2,
        sc_mismatch:-3,
        gap_open:5,
        gap_extend:2,
        filter:"L;m;",
        }

      };
db.queries.save(query);

hit={
query_id: query._id,
num:1,

id:"gi|118082669|ref|XM_416233.2|",

gi:118082669,

def:"PREDICTED: Gallus gallus similar to ubiquitous tetratricopeptide containing protein RoXaN; Rotavirus X asso
ciated non-structural protein (LOC417996), mRNA",
acn:"XM_416233",
len: 2868,
hsp:[
{
num:1,
bit_score:544.1,
score:602,
evalue:3.34611e-151,
query_from:92,
query_to:395,
hit_from:2378,
hit_to:2681,
query_frame:1,
hit_frame:1,
identity:303,
positive:303,
gaps:0,
align_len:304,
qseq:"TACTAGATATGCAGCAGACCTATGACATGTGGCTAAAGAAACACAATCCTGGGAAGCCTGGAGAGGGAACACCACTCACTTCGCGAGAAGGGGAGAAACAGATCCA
GATGCCCACTGACTATGCTGACATCATGATGGGCTACCACTGCTGGCTCTGCGGGAAGAACAGCAACAGCAAGAAGCAATGGCAGCAGCACATCCAGTCAGAGAAGCACAAG
GAGAAGGTCTTCACCTCAGACAGTGACTCCAGCTGCTGGAGCTATCGCTTCCCTATGGGCGAGTTCCAGCTCTGTGAAAGGTACCA",
midline:"|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||",
hseq:"TACTAGATATGCAGCAGACCTATGACATGTGGCTAAAGAAACACAATCCTGGGAAGCCTGGAGAGGGAACACCACTCACTTCGCGAGAAGGGGAGAAACAGATCCA
GATGCCCACTGACTATGCTGACATCATGATGGGCTACCACTGCTGGCTCTGCGGGAAGAACAGCAACAGCAAGAAGCAATGGCAGCAGCACATCCAGTCAGAGAAGCACAAG
GAGAAGGTCTTCACCTCAGACAGTGACTCCAGCTGCTGGAGCTATCGCTTCCCTATGGGCGAGTTCCAGCTCTGTGAAAGGTTCCA"
}
(...)
ADD COMMENTlink written 8.6 years ago by Pierre Lindenbaum119k

Well I've never worked with XSLT so it would be easier if someone knows how to parse it right using biopython + it all has to be done automatically, but I'll keep it in mind. Thanks

ADD REPLYlink written 8.6 years ago by Niek De Klein2.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1728 users visited in the last hour