Question: Understanding hhblits output
gravatar for mnsp088
5 weeks ago by
mnsp08820 wrote:

Hi everyone,

I just ran my first hhblits (hhblits -cpu 4 -M first -i MSA/g_1.fa.out -d my_databases/my_db) and I noticed there are multiple hits to the same cluster in my results file (for e.g. see column 2 below). I'm guessing this represents different domains with homology to my query MSA that are all significant, but i wanted to double check if this makes sense. Anyone run this before and seen a similar output?

 No   Hit          Prob   E-value P-value  Score  SS  Cols  Query HMM  Template HMM
  1 cluster_id_124 100.0   1E-42 6.7E-46  242.0   0.0  201   13-221   101-350 (396)
  2 cluster_id_124 100.0 1.6E-42   1E-45  241.0   0.0  202    7-219    48-261 (396)
  6 cluster_id_124 100.0 9.2E-37 6.1E-40  211.5   0.0  198   11-218   142-391 (396)

Also, my database is made up of ~2k HMMs, why then in the output results file, I see that there is only 136 searched HMMs?

Query         g_1
Match_columns 229
No_of_seqs    1529 out of 22987
Neff          11.9485
Searched_HMMs 136

Thank you for any input.

hmm hhblits homology hhsuite • 121 views
ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by mnsp08820

Is this from a custom database?

The output looks reasonable at a glance, but I’ve not seen cluster_id_xxx before. I typically use hhsearch too, so there could be some difference in the program that I’m not accounting for.

I usually run my searches against the PDB, so I get PDB hits back.

ADD REPLYlink written 4 weeks ago by jrj.healey13k

Yes, this is from a custom database. Each HMM in my database is produced from a multiple sequence alignment of an ortholog group.

Do you also see duplicate hits when you used PDB?

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by mnsp08820

Yep its quite common to get multiples with PDB, this can be because theres multiple internal matches within a sequence (e.g. repetitive spans) or multiple domains.

It’s also quite common to have the same PDB ID come up, if matching to structures with multiple similar or identical chains, e.g. match 1 might be PDB ID 123A chain A, and 2 might be PDB ID 123A chain B, but both would come up as 123A.

ADD REPLYlink written 4 weeks ago by jrj.healey13k

That makes sense, thank you for the explanation jrj.healey !

ADD REPLYlink written 4 weeks ago by mnsp08820
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1719 users visited in the last hour