Question: Understanding hhblits output
Hi everyone,

I just ran my first hhblits (hhblits -cpu 4 -M first -i MSA/g_1.fa.out -d my_databases/my_db) and I noticed there are multiple hits to the same cluster in my results file (for e.g. see column 2 below). I'm guessing this represents different domains with homology to my query MSA that are all significant, but i wanted to double check if this makes sense. Anyone run this before and seen a similar output?

 No   Hit          Prob   E-value P-value  Score  SS  Cols  Query HMM  Template HMM
  1 cluster_id_124 100.0   1E-42 6.7E-46  242.0   0.0  201   13-221   101-350 (396)
  2 cluster_id_124 100.0 1.6E-42   1E-45  241.0   0.0  202    7-219    48-261 (396)
  6 cluster_id_124 100.0 9.2E-37 6.1E-40  211.5   0.0  198   11-218   142-391 (396)

Also, my database is made up of ~2k HMMs, why then in the output results file, I see that there is only 136 searched HMMs?

Query         g_1
Match_columns 229
No_of_seqs    1529 out of 22987
Neff          11.9485
Searched_HMMs 136

Thank you for any input.

Is this from a custom database?

The output looks reasonable at a glance, but I’ve not seen cluster_id_xxx before. I typically use hhsearch too, so there could be some difference in the program that I’m not accounting for.

I usually run my searches against the PDB, so I get PDB hits back.

Yes, this is from a custom database. Each HMM in my database is produced from a multiple sequence alignment of an ortholog group.

Do you also see duplicate hits when you used PDB?

Yep its quite common to get multiples with PDB, this can be because theres multiple internal matches within a sequence (e.g. repetitive spans) or multiple domains.

It’s also quite common to have the same PDB ID come up, if matching to structures with multiple similar or identical chains, e.g. match 1 might be PDB ID 123A chain A, and 2 might be PDB ID 123A chain B, but both would come up as 123A.

That makes sense, thank you for the explanation jrj.healey !

