I have to use the HMMScan tool to map a HMM alignment with a specific PDB.

So, I don't know how to re-index the family residues based on a crystal structure. I didn't understand the output yet.

Someone could help me with this?


  • Query start/end - The start/end of the MEA alignment of this domain/hit with respect to the profile HMM, which directly relates to the query sequence for phmmer. For hmmsearch, the number corresponds to the match states that HMMER determined from the initial input alignment.
  • Target Envelope - the domain envelope on the sequences defines a subsequence for which their is substantial probability mass supporting a homologous domain/hit, whether or not a single discrete alignment can be identified. The envelope may extend beyond the positions of the MEA alignment.
  • Target Alignment - The start/end of the maximum expected accuracy (MEA) alignment of this domain with respect to the target sequence.
  • There can be multiple hits per sequence because HMMER performs local-local searches (meaning any subsequence of the query model can align to any subsequence of the target sequence).

The position values are basically telling you the start and end of the alignment between query sequence and model. The reason they're not at the same positions is because they're telling you different things. The start/stop of the query sequence is telling you the place within your peptide the domain was found. The model positions tell you the place where your query matched the model for that domain.

Your protein won't contain one domain encompassing the whole sequence, and just because you hit against a domain doesn't mean you match the whole model for that domain.

They're not the same length because HMMER may have inserted gaps into your target sequence or in the model for that domain during the alignment. Basically it is saying "this part of your query looks more like this part of this domain than one would expect by chance".

I'm not sure what you mean by "match up", but if you mean "where in my peptide is this domain", you only need to look at the positions that refer to your query sequence.

Hummm I understand...

But, if I have to re-index the model based on the position that it matches the query.

If I have:

MODEL aarGkelftanCaaCHgaggggakqggapnlsgaaerysadsiaairanprqvsap..........avafekkpltaemparkqltdqeiadlaaYlms

MATCH  +++Gk++f  +C++CH ++ gg+ +  +pnl g+  r+++++ + +++++++ +              +++k+ ++++m+     +++e+adl+aYl++


If my alignment starts on position 3 and the model begins on position 1, it means that I have to re-index my residue 1 to 3. But I don't know what to do with the inserts and the gaps...

No, it means that the alignment starts at the 3rd residue of your query sequence.

ex: query = CCBABA, target = BABA



The alignment goes from 3-7 in your query sequence. This doesn't mean that residue 1 (C) becomes residue 3.

I don't think I'm sure I understand what you mean by "reindex", what are you trying to do with the output of HMMER?

"For comparison
with a crystal structure the family residue pairs should be re-indexed to be
consistent with the Pfam domain mapping in the PDB protein"

"Once we have a list of ranked pairs of directly coupled HMM columns we can use them for
validation in a known protein crystal structure. This is done to verify that these are real contacts
in a protein domain, or for analysis"

That is what I have to do. Map the alignment and then re-index the positions on HMM according with the PDB. I need to know the position of the amino acids to obtain the contact map.

I think what they're saying is that you only keep the parts of the query that aligned with the HMM. Throw out everything else.



You would only keep BABACAT from your query sequence. The first B becomes residue 1. You can use Gblocks for this.

As far as giving each sequence a position (index), you just give them the value in the model. E.g. if Query 1 aligned to HMM 8, you give that base an index of 8.

Can you link the paper you're quoting?

HMM is sequence based, not structure based. You might wanna get the FASTA file from PDB for the structure you have and use HMMScan against the obtained FASTA sequence.

Yes, that's correct. I am using the FASTA sequence from a specific PDB.

I still don't understand the output file from HMMScan. For example: I am using the PF00034 and PDB 1J3S.

The alignment starts on residue 3 and ends on residue 100. The model starts on residue 1 and ends on residue 89. How can I match these things?

I see 3->102, not 3->100

It is not clear what your question is. HMMScan aligns the alignable part of your query sequence to the HMM. Your query sequence has two additional residues at the N-terminal and some other insertions that are not present in the HMM.

