Identify multidomain protein after hmmscan
Entering edit mode
6.8 years ago
dago ★ 2.7k

I have some trouble to select an appropriate criterion to identify the presence of multiple domain in proteins.

I perform an `hmmscan` search of a list of protein selcting the flag `--tblout`

The output reports several fields:

--- full sequence ---- --- best 1 domain ---- --- domain number estimation ----# ..E-value  score  bias   E-value  score  bias   exp reg clu  ov env dom rep inc...
------------------- ---------- -------------------- ---------- --------- ------ 

Reading the manual I think that the first value to check if the E value of both full sequence and Best 1 domain. If the second is significant lower the the E value of the fill seq the results for this protein should be carefully considered.

I also understand that the resulting domains are in order of statistic significance. So the first one, is more likely there. Now, I have some problem to understand what parameter to consider for deciding if I am dealing with a multi-domain protein or not. Should I consider just the "exp" value?





hmmscan protein domain • 1.7k views
Entering edit mode
6.7 years ago
venu 7.0k

just do the hmmscan with individual family profiles (the most significant one and the next to it is enough) without --tblout flag (and if you previously used --noali remove that also), if you find any continuous gap in the alignment with first profile and that gap is filled with the second profile, that is a multidomain protein. If the protein is multi domain one, the two families consisting those domains are come in result as first and second almost in every case.  


Login before adding your answer.

Traffic: 1664 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6