I have a list of proteins that define a family of enzyme.
I want to look if these enzymes are present in a genome I am working with. Therefore, I applied two approaches:
- blastp the above mentioned sequences against the genome
- create a hmm profile for the proteins and use hmmscan to search for them in the genome
In both case I get a variable number of hits.
However, there is something that I am missing from the biological point of view.
The local alignment of blast would suggest me that the obtained hits could be similar to the query proteins also in region that are not directly connected to the enzymatic reaction. Therefore, the hits I obtained can in theory also not have a similar function to the reference.
Creating a hmm profile instead, I will just consider the region in the proteins that are highly conserved and that are likely crucial for the functionality of the enzyme. Therefore, using this approach would give me hits that share more likely similarity in function with the reference proteins.
Am I missing something or my reasoning is correct?