Question: Bio.SearchIO.HmmerIO only showing the best score domain
1
gravatar for Selenocysteine
20 months ago by
Dublin, Ireland
Selenocysteine330 wrote:

Hi everyone,

I am trying to parse a table file I obtained with hmmscan by comparing a set of fastas against the whole Pfam database of protein domains. I am trying to use HmmerIO in Python to parse the ouput, but it seems to me that, for each fasta, it only keeps the domain with the highest evalue, while I want to keep all of those with a significant evalue.

Example:

Input file:

    #                                                                             --- full sequence ---- --- best 1 domain ---- --- domain number estimation ----
# target name        accession  query name                         accession    E-value  score  bias   E-value  score  bias   exp reg clu  ov env dom rep inc description of target
#------------------- ----------               -------------------- ---------- --------- ------ ----- --------- ------ -----   --- --- --- --- --- --- --- --- ---------------------
GST_N                PF02798.17 gi|528981796|ref|NC_021285.1|_5567 -            4.1e-16   58.9   0.0   6.9e-16   58.2   0.0   1.4   1   0   0   1   1   1   1 Glutathione S-transferase, N-terminal domain
GST_N_2              PF13409.3  gi|528981796|ref|NC_021285.1|_5567 -            3.3e-14   52.8   0.0   5.8e-14   52.0   0.0   1.4   1   0   0   1   1   1   1 Glutathione S-transferase, N-terminal domain
GST_N_3              PF13417.3  gi|528981796|ref|NC_021285.1|_5567 -            3.4e-14   52.8   0.0   5.4e-14   52.2   0.0   1.3   1   0   0   1   1   1   1 Glutathione S-transferase, N-terminal domain
GST_C                PF00043.22 gi|528981796|ref|NC_021285.1|_5567 -              7e-12   45.3   0.0   9.6e-12   44.8   0.0   1.2   1   0   0   1   1   1   1 Glutathione S-transferase, C-terminal domain
GST_C_2              PF13410.3  gi|528981796|ref|NC_021285.1|_5567 -            6.1e-07   29.2   0.4   9.3e-07   28.6   0.4   1.3   1   0   0   1   1   1   1 Glutathione S-transferase, C-terminal domain
GST_C_3              PF14497.3  gi|528981796|ref|NC_021285.1|_5567 -            0.00011   22.2   0.0   0.00017   21.6   0.0   1.2   1   0   0   1   1   1   1 Glutathione S-transferase, C-terminal domain

Here I found several statistically significant domains for the PF02798.17 gi|528981796|ref|NC_021285.1|_5567 query (fasta sequence), but the SearchIO output I have been able to get looks like this:

for qresult in SearchIO.parse(file1, 'hmmer3-tab'):
    print qresult.hits[0]

Query: gi|528981796|ref|NC_021285.1|_5567
       <unknown description>
  Hit: GST_N
       Glutathione S-transferase, N-terminal domain
 HSPs: ----  --------  ---------  ------  ---------------  ---------------------
          #   E-value  Bit score    Span      Query range              Hit range
       ----  --------  ---------  ------  ---------------  ---------------------
          0   6.9e-16      58.20       ?      [None:None]            [None:None]

I have no idea of how to access the other hits, even if I am quite sure there has to be a way.

Anyone has any experience with this?

Thanks a lot.

hmmer python • 592 views
ADD COMMENTlink modified 20 months ago • written 20 months ago by Selenocysteine330
4
gravatar for Selenocysteine
20 months ago by
Dublin, Ireland
Selenocysteine330 wrote:

I found the answer myself, it was a dumb coding mistake... If anyone ever needs it, the point is that every fasta sequence can have different hits and hits[0] will only show the first of the list.

The correct way to proceed is:

for qresult in SearchIO.parse(file1, 'hmmer3-tab'):
     for item in qresult.hits:
          print item
          print item.evalue # this will print the evalue of each domain hit
ADD COMMENTlink modified 18 months ago • written 20 months ago by Selenocysteine330

Is there a way I can print it to a file as a tab delimited.

Query, target name, filtered(evalue), qlen ? ##filtered evalue is nothing but i would like output below a specific evalue

Thanks.

ADD REPLYlink written 13 months ago by badribio200
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1096 users visited in the last hour