Dear all :-)
I am trying to compute the conservation score of each position of a protein multiple sequence alignment. I already used the Shannon entropy, but I am not satisfied with it since it is not similarity-based but identity only. So I thought that maybe it could be a good idea to use a substitution matrix. I tried to implement two methods:
- Protein–Protein Interfaces: Analysis of Amino Acid Conservation in Homodimers (https://doi.org/10.1002/1097-0134(20010101)42:1%3C108::AID-PROT110%3E3.0.CO;2-O)
- the "sum-of-pairs" method from AL2CO (https://doi.org/10.1093/bioinformatics/17.8.700)
The first method gives me wrong results (maybe because I used BLOSUM62 instead of PET91 used in the article...). The second method (AL2CO) doesn't give me satisfying results.
In practice, I would like a score in [0,1] with some sensitivity to sequence redundancy. I have a workflow in python that process my alignment and calculate properties, so I try as much as possible to avoid external tools...
Do you have some bits of advice or maybe a hidden magick package that I didn't found :-)?