Question: Understanding hhblits output
gravatar for mnsp088
9 months ago by
mnsp08840 wrote:

Hi everyone,

I just ran my first hhblits (hhblits -cpu 4 -M first -i MSA/g_1.fa.out -d my_databases/my_db) and I noticed there are multiple hits to the same cluster in my results file (for e.g. see column 2 below). I'm guessing this represents different domains with homology to my query MSA that are all significant, but i wanted to double check if this makes sense. Anyone run this before and seen a similar output?

 No   Hit          Prob   E-value P-value  Score  SS  Cols  Query HMM  Template HMM
  1 cluster_id_124 100.0   1E-42 6.7E-46  242.0   0.0  201   13-221   101-350 (396)
  2 cluster_id_124 100.0 1.6E-42   1E-45  241.0   0.0  202    7-219    48-261 (396)
  6 cluster_id_124 100.0 9.2E-37 6.1E-40  211.5   0.0  198   11-218   142-391 (396)

Also, my database is made up of ~2k HMMs, why then in the output results file, I see that there is only 136 searched HMMs?

Query         g_1
Match_columns 229
No_of_seqs    1529 out of 22987
Neff          11.9485
Searched_HMMs 136

Thank you for any input.

hmm hhblits homology hhsuite • 422 views
ADD COMMENTlink modified 8 months ago by Biostar ♦♦ 20 • written 9 months ago by mnsp08840

Is this from a custom database?

The output looks reasonable at a glance, but I’ve not seen cluster_id_xxx before. I typically use hhsearch too, so there could be some difference in the program that I’m not accounting for.

I usually run my searches against the PDB, so I get PDB hits back.

ADD REPLYlink written 9 months ago by Joe16k

Yes, this is from a custom database. Each HMM in my database is produced from a multiple sequence alignment of an ortholog group.

Do you also see duplicate hits when you used PDB?

ADD REPLYlink modified 9 months ago • written 9 months ago by mnsp08840
gravatar for Joe
9 months ago by
United Kingdom
Joe16k wrote:

Yep its quite common to get multiples with PDB, this can be because theres multiple internal matches within a sequence (e.g. repetitive spans) or multiple domains.

It’s also quite common to have the same PDB ID come up, if matching to structures with multiple similar or identical chains, e.g. match 1 might be PDB ID 123A chain A, and 2 might be PDB ID 123A chain B, but both would come up as 123A.

ADD COMMENTlink written 9 months ago by Joe16k

That makes sense, thank you for the explanation jrj.healey !

ADD REPLYlink written 9 months ago by mnsp08840
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1147 users visited in the last hour