Entering edit mode
9.2 years ago
MAK
▴
10
I want to parse the output file of hmmsearch from hmmer2 with biopython SearchIO
for qresult in SearchIO.read('test2.out', 'hmmer2-text'):
print qresult
I get this error:
Traceback (most recent call last):
File "file.py", line 3285, in <module>
for qresult in SearchIO.read('test2.out', 'hmmer2-text'):
File "/usr/lib/python2.7/site-packages/Bio/SearchIO/__init__.py", line 359, in read
first = next(generator)
File "/usr/lib/python2.7/site-packages/Bio/SearchIO/__init__.py", line 316, in parse
for qresult in generator:
File "/usr/lib/python2.7/site-packages/Bio/SearchIO/HmmerIO/hmmer2_text.py", line 45, in __iter__
for qresult in self.parse_qresult():
File "/usr/lib/python2.7/site-packages/Bio/SearchIO/HmmerIO/hmmer2_text.py", line 126, in parse_qresult
self.parse_hsp_alignments()
File "/usr/lib/python2.7/site-packages/Bio/SearchIO/HmmerIO/hmmer2_text.py", line 286, in parse_hsp_alignments
otherseq += self.line[19:].split()[0].strip()
IndexError: list index out of range
The h2.out
file:
hmmsearch - search a sequence database with a profile HMM
HMMER 2.3.2 (Oct 2003)
Copyright (C) 1992-2003 HHMI/Washington University School of Medicine
Freely distributed under the GNU General Public License (GPL)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
HMM file: ../../RefSeq/outputdir/GM-refseq63/NC_020772/1244200_cox1.hm [1244200_cox1]
Sequence database: ../data/trees_and_data/fasta_files-all/219678_cox1.fas
per-sequence score cutoff: [none]
per-domain score cutoff: [none]
per-sequence Eval cutoff: <= 10
per-domain Eval cutoff: [none]
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Query HMM: 1244200_cox1
Accession: [none]
Description: [none]
[HMM has been calibrated; E-values are empirical estimates]
Scores for complete sequences (score includes all domains):
Sequence Description Score E-value N
-------- ----------- ----- ------- ---
219678_cox1 1120.6 0 1
Parsed for domains:
Sequence Domain seq-f seq-t hmm-f hmm-t score E-value
-------- ------- ----- ----- ----- ----- ----- -------
219678_cox1 1/1 3 513 .. 1 511 [] 1120.6 0
Alignments of top-scoring domains:
219678_cox1: domain 1 of 1, from 3 to 513: score 1120.6, E = 0
*->mnKWLFstnHKDiGtLYFmFGmWsGmvGssmsWiiRiELGqPGAFiG
+ +W FstnHKDiGtLY +FG W+GmvG ++s +iR EL qPG +G
219678_cox 3 ITRWFFSTNHKDIGTLYLVFGAWAGMVGTALSLLIRAELSQPGSLLG 49
nDqiYnvvvtAHAFimiFFmvmPimiGGFGnWLvPLmiGAPDmAFPRmnn
DqiYnv+vtAHAF+miFFmvmPi+iGGFGnWLvPLmiGAPDmAFPRmnn
219678_cox 50 DDQIYNVIVTAHAFVMIFFMVMPILIGGFGNWLVPLMIGAPDMAFPRMNN 99
msFWLLPPsLtLLmAssFiEmGAGtGWtvYPPLsnsLFHsGPsvDLAiFs
msFWLLPPs LL+Ass +E GAGtGWtvYPPL+ L H+G svDL iFs
219678_cox 100 MSFWLLPPSFLLLLASSGVEAGAGTGWTVYPPLAGNLAHAGASVDLTIFS 149
LHLAGvssiLGAinFistiinmRPtGmiPERiPLFvWsvGitALLLLLsL
LHLAGvssiLGA+nFi tiinm+P PLFvWsv +tA+LLLLsL
219678_cox 150 LHLAGVSSILGAVNFITTIINMKPPATSQYQTPLFVWSVLVTAVLLLLSL 199
PvLAGAitmLLtDRnFntsFFnPtGGGDPiLYqHLFWFFGHPEvYiLiLP
PvLA itmLLtDRn nt FF P GGGDPiLYqHLFWFFGHPEvYiLiLP
219678_cox 200 PVLAAGITMLLTDRNLNTTFFDPAGGGDPILYQHLFWFFGHPEVYILILP 249
GFGLisHiisqEsGKnEtFGvLGmiYAmmAiGLLGFivWAHHmFtvGmDv
GFG+isH+++ sGK E FG +Gm++AmmAiGLLGFivWAHHmFtvGmDv
219678_cox 250 GFGIISHVVAYYSGKKEPFGYMGMVWAMMAIGLLGFIVWAHHMFTVGMDV 299
DtRAYFtsAtmiiAvPtGiKiFsWLAtLHGvHvKYtPsmLWALGFvFLFt
DtRAYFtsAtmiiA+PtG+K+FsWLAtLHG +K+ mLWALGF+FLFt
219678_cox 300 DTRAYFTSATMIIAIPTGVKVFSWLATLHGGSIKWETPMLWALGFIFLFT 349
iGGLtGviLAnssiDivLHDtYYvvAHFHYvLsmGAvFAimGsFiqWYPL
+GGLtG++LAnss+DivLHDtYYvvAHFHYvLsmGAvFAim +F+ W+PL
219678_cox 350 VGGLTGIVLANSSLDIVLHDTYYVVAHFHYVLSMGAVFAIMAAFVHWFPL 399
FtGmtmKnKWLKiqFGLmFiGvnmtFFPqHFLGLsGmPRRYsDYPDCYtt
F G t+ W Ki FG+mFiGvn+tFFPqHFLGL+GmPRRYsDYPD Yt
219678_cox 400 FSGYTLNDTWTKIHFGVMFIGVNLTFFPQHFLGLAGMPRRYSDYPDAYTL 449
WniistiGstLsmLsiFmFimiLWEsmisKRmmLFssnmtssiEWLqKtP
Wn +s iGs +s++++ mF+ iLWE+ +KR + t +EWL P
219678_cox 450 WNTVSSIGSLISLVAVIMFLFILWEAFAAKREVSSVELTTTNVEWLHGCP 499
PAEHsYCELPmLns<-*
P H + E +
219678_cox 500 PPYHTFEEPAFVQV 513
Histogram of all scores:
score obs exp (one = represents 1 sequences)
----- --- ---
1120 1 0|=
% Statistical details of theoretical EVD fit:
mu = -346.4556
lambda = 0.0755
chi-sq statistic = 0.0000
P(chi-square) = 0
Total sequences searched: 1
Whole sequence top hits:
tophits_s report:
Total hits: 1
Satisfying E cutoff: 1
Total memory: 16K
Domain top hits:
tophits_s report:
Total hits: 1
Satisfying E cutoff: 1
Total memory: 17K
Any idea what's wrong?
Which version of Biopython are you using?
P.S. I modified the question formatting to mark the exception and file as "code" to display more clearly.