biopython SearchIO with hmmearch 2 error
0
0
Entering edit mode
9.2 years ago
MAK ▴ 10

I want to parse the output file of hmmsearch from hmmer2 with biopython SearchIO

for qresult in SearchIO.read('test2.out', 'hmmer2-text'): 
    print qresult

I get this error:

Traceback (most recent call last):
  File "file.py", line 3285, in <module>
    for qresult in SearchIO.read('test2.out', 'hmmer2-text'):
  File "/usr/lib/python2.7/site-packages/Bio/SearchIO/__init__.py", line 359, in read
    first = next(generator)
  File "/usr/lib/python2.7/site-packages/Bio/SearchIO/__init__.py", line 316, in parse
    for qresult in generator:
  File "/usr/lib/python2.7/site-packages/Bio/SearchIO/HmmerIO/hmmer2_text.py", line 45, in __iter__
    for qresult in self.parse_qresult():
  File "/usr/lib/python2.7/site-packages/Bio/SearchIO/HmmerIO/hmmer2_text.py", line 126, in parse_qresult
    self.parse_hsp_alignments()
  File "/usr/lib/python2.7/site-packages/Bio/SearchIO/HmmerIO/hmmer2_text.py", line 286, in parse_hsp_alignments
    otherseq += self.line[19:].split()[0].strip()
IndexError: list index out of range

The h2.out file:

hmmsearch - search a sequence database with a profile HMM
HMMER 2.3.2 (Oct 2003)
Copyright (C) 1992-2003 HHMI/Washington University School of Medicine
Freely distributed under the GNU General Public License (GPL)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
HMM file:                   ../../RefSeq/outputdir/GM-refseq63/NC_020772/1244200_cox1.hm [1244200_cox1]
Sequence database:          ../data/trees_and_data/fasta_files-all/219678_cox1.fas
per-sequence score cutoff:  [none]
per-domain score cutoff:    [none]
per-sequence Eval cutoff:   <= 10        
per-domain Eval cutoff:     [none]
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query HMM:   1244200_cox1
Accession:   [none]
Description: [none]
  [HMM has been calibrated; E-values are empirical estimates]

Scores for complete sequences (score includes all domains):
Sequence    Description                                 Score    E-value  N 
--------    -----------                                 -----    ------- ---
219678_cox1                                            1120.6          0   1

Parsed for domains:
Sequence    Domain  seq-f seq-t    hmm-f hmm-t      score  E-value
--------    ------- ----- -----    ----- -----      -----  -------
219678_cox1   1/1       3   513 ..     1   511 []  1120.6        0

Alignments of top-scoring domains:
219678_cox1: domain 1 of 1, from 3 to 513: score 1120.6, E = 0
                   *->mnKWLFstnHKDiGtLYFmFGmWsGmvGssmsWiiRiELGqPGAFiG
                      + +W FstnHKDiGtLY +FG W+GmvG ++s +iR EL qPG  +G
  219678_cox     3    ITRWFFSTNHKDIGTLYLVFGAWAGMVGTALSLLIRAELSQPGSLLG 49   

                   nDqiYnvvvtAHAFimiFFmvmPimiGGFGnWLvPLmiGAPDmAFPRmnn
                    DqiYnv+vtAHAF+miFFmvmPi+iGGFGnWLvPLmiGAPDmAFPRmnn
  219678_cox    50 DDQIYNVIVTAHAFVMIFFMVMPILIGGFGNWLVPLMIGAPDMAFPRMNN 99   

                   msFWLLPPsLtLLmAssFiEmGAGtGWtvYPPLsnsLFHsGPsvDLAiFs
                   msFWLLPPs  LL+Ass +E GAGtGWtvYPPL+  L H+G svDL iFs
  219678_cox   100 MSFWLLPPSFLLLLASSGVEAGAGTGWTVYPPLAGNLAHAGASVDLTIFS 149  

                   LHLAGvssiLGAinFistiinmRPtGmiPERiPLFvWsvGitALLLLLsL
                   LHLAGvssiLGA+nFi tiinm+P        PLFvWsv +tA+LLLLsL
  219678_cox   150 LHLAGVSSILGAVNFITTIINMKPPATSQYQTPLFVWSVLVTAVLLLLSL 199  

                   PvLAGAitmLLtDRnFntsFFnPtGGGDPiLYqHLFWFFGHPEvYiLiLP
                   PvLA  itmLLtDRn nt FF P GGGDPiLYqHLFWFFGHPEvYiLiLP
  219678_cox   200 PVLAAGITMLLTDRNLNTTFFDPAGGGDPILYQHLFWFFGHPEVYILILP 249  

                   GFGLisHiisqEsGKnEtFGvLGmiYAmmAiGLLGFivWAHHmFtvGmDv
                   GFG+isH+++  sGK E FG +Gm++AmmAiGLLGFivWAHHmFtvGmDv
  219678_cox   250 GFGIISHVVAYYSGKKEPFGYMGMVWAMMAIGLLGFIVWAHHMFTVGMDV 299  

                   DtRAYFtsAtmiiAvPtGiKiFsWLAtLHGvHvKYtPsmLWALGFvFLFt
                   DtRAYFtsAtmiiA+PtG+K+FsWLAtLHG  +K+   mLWALGF+FLFt
  219678_cox   300 DTRAYFTSATMIIAIPTGVKVFSWLATLHGGSIKWETPMLWALGFIFLFT 349  

                   iGGLtGviLAnssiDivLHDtYYvvAHFHYvLsmGAvFAimGsFiqWYPL
                   +GGLtG++LAnss+DivLHDtYYvvAHFHYvLsmGAvFAim +F+ W+PL
  219678_cox   350 VGGLTGIVLANSSLDIVLHDTYYVVAHFHYVLSMGAVFAIMAAFVHWFPL 399  

                   FtGmtmKnKWLKiqFGLmFiGvnmtFFPqHFLGLsGmPRRYsDYPDCYtt
                   F G t+   W Ki FG+mFiGvn+tFFPqHFLGL+GmPRRYsDYPD Yt 
  219678_cox   400 FSGYTLNDTWTKIHFGVMFIGVNLTFFPQHFLGLAGMPRRYSDYPDAYTL 449  

                   WniistiGstLsmLsiFmFimiLWEsmisKRmmLFssnmtssiEWLqKtP
                   Wn +s iGs +s++++ mF+ iLWE+  +KR +      t  +EWL   P
  219678_cox   450 WNTVSSIGSLISLVAVIMFLFILWEAFAAKREVSSVELTTTNVEWLHGCP 499  

                   PAEHsYCELPmLns<-*
                   P  H + E   +     
  219678_cox   500 PPYHTFEEPAFVQV    513  

Histogram of all scores:
score    obs    exp  (one = represents 1 sequences)
-----    ---    ---
 1120      1      0|=                                                          

% Statistical details of theoretical EVD fit:
              mu =  -346.4556
          lambda =     0.0755
chi-sq statistic =     0.0000
  P(chi-square)  =          0

Total sequences searched: 1

Whole sequence top hits:
tophits_s report:
     Total hits:           1
     Satisfying E cutoff:  1
     Total memory:         16K

Domain top hits:
tophits_s report:
     Total hits:           1
     Satisfying E cutoff:  1
     Total memory:         17K

Any idea what's wrong?

biopython hmmsearch hmmer python • 2.2k views
ADD COMMENT
0
Entering edit mode

Which version of Biopython are you using?

P.S. I modified the question formatting to mark the exception and file as "code" to display more clearly.

ADD REPLY

Login before adding your answer.

Traffic: 1907 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6