Identify multidomain protein after hmmscan
6.8 years ago
dago

I have some trouble to select an appropriate criterion to identify the presence of multiple domain in proteins.

I perform an `hmmscan` search of a list of protein selcting the flag `--tblout`

The output reports several fields:

--- full sequence ---- --- best 1 domain ---- --- domain number estimation ----# ..E-value  score  bias   E-value  score  bias   exp reg clu  ov env dom rep inc...
------------------- ---------- -------------------- ---------- --------- ------ 

Reading the manual I think that the first value to check if the E value of both full sequence and Best 1 domain. If the second is significant lower the the E value of the fill seq the results for this protein should be carefully considered.

I also understand that the resulting domains are in order of statistic significance. So the first one, is more likely there. Now, I have some problem to understand what parameter to consider for deciding if I am dealing with a multi-domain protein or not. Should I consider just the "exp" value?





6.7 years ago
venu

just do the hmmscan with individual family profiles (the most significant one and the next to it is enough) without --tblout flag (and if you previously used --noali remove that also), if you find any continuous gap in the alignment with first profile and that gap is filled with the second profile, that is a multidomain protein. If the protein is multi domain one, the two families consisting those domains are come in result as first and second almost in every case.  


