Question

Which Protein Domains Are More Appropriate? The Ones Detected By Hmmer 3 Or Uniprot Or Smart Or Pfam?

3

Entering edit mode

12.3 years ago

Hari ▴ 280

hi all, I already asked a question with regards to the domain length inconsistency detected by HMMER3.So this question is very much based upon my previous question,i have a set of protein domains detected by HMMER that differs from the ones that are in uniprot,SMART and pfam.I found that only a few residues differ from each other.So which one of these do you think is atleast most appropriate to use? I am aware that HMMER is the core utility most of the protein databases are based upon.Yet,i am confused which ones to use?

Thanks in advance

protein hmm hmmer • 4.6k views

ADD COMMENT • link updated 12.3 years ago by John Van Dam ▴ 110 • written 12.3 years ago by Hari ▴ 280

score 6 · Answer 1 · 2012-01-03

Hi Hari,

First, I believe SMART and PFAM use different versions of HMMER to search. Pfam has switched to HMMER3, but SMART is still using HMMER2. This alone can account for some inconsistencies. The HMMER2 ls/fs options are not implemented in HMMER3 in the same way. From my head Pfam previously used HMMER2 with ls (globally align the domainmodel, but HMMER3 does not implement ls as of yet and locally aligns the domain model). SMART of course have their own models, compared to Pfam, so depending on how they defined the domain you will find differences as well. SMART focusses more on signalling related domains while Pfam also contains models for protein families, not just domains. Also SMART maintains a separate list of cut-off values for specific domains when they occur in repetition, these are filtered by scripts after running HMMSEARCH.

So usage of uniprot, SMART, Pfam or all of the above depends really on what your ultimate goal is. If you have a large dataset I suggest you use HMMER3+Pfam for the sake of not having to wait eons, also HMMER3 is more sensitive. You can build the SMART models into HMMER3 yourself, but expect different results compared to the SMART database.

Food for thought: domain boundaries are very ill defined. So having domain hits that are off by 1 or 2 residues may not be such a problem... If somebody can comment on this please do!