Question

How a protein sequence is classified in a HMM

1

Entering edit mode

4.4 years ago

cirooliveira3 ▴ 10

I'm using Pfam HMMs to identify domains in my T. cruzi proteins with the HMMER program.

I kinda understand that a HMM is build by a family multiple sequence alignment, by calculating probabilities of states.

But how the hmmscan works to classify a new protein by families HMMs? Where can I read more about this? Academic references would be appreciated.

alignment sequencing HMMER hmmscan pfam • 961 views

ADD COMMENT • link updated 4.4 years ago by Mensur Dlakic ★ 27k • written 4.4 years ago by cirooliveira3 ▴ 10

2

Entering edit mode

Have you tried Googling? The Durbin book should have a chapter on HMMs and HMM based classification is a relatively common topic. You could start with HMMER paper's reference section and work backwards until you get to an explanation of how and why it's done.

ADD REPLY • link 4.4 years ago by Ram 43k

score 3 · Accepted Answer · 2019-11-25

In this context, HMMs are numerical representations of sequence alignments that are more information-rich because both residue substitution patterns and gap penalties are treated in probabilistic fashion. Diverse sequences are given higher weight in HMMs, and they also include background residue probabilities in a way that is inversely proportional to the alignment depth.

If you don't have access to Krogh, Durbin & Eddy book, the PDF version is available here. Any of the early Krogh, Eddy and Karplus papers would be good as starting points.

https://www.ncbi.nlm.nih.gov/pubmed/8744772

https://www.ncbi.nlm.nih.gov/pubmed/9918945

https://www.ncbi.nlm.nih.gov/pubmed/9927713

https://www.ncbi.nlm.nih.gov/pubmed/18075166