I am new to this field. A little confused about how to define an overall sequence identity between two proteins.
Because one protein can have multiple chains. What I was doing is to compare chain to chain. For protein A and B, I get the maximum similarity for each chain in A to all chains in B, and then get the minimum similarity in all maximum similarities. Or I just conjugate all protein chain sequence to get a whole sequence for that protein.
However, I think I probably should give more credit to long sequence, because short sequence is easier to be similar.
Is there any canonical way to get the identity/similarity score between two protein sequences?
And I can also add protein structure information. But for multiple domain proteins, I could not find a score which can scale from 0 to 1, or as easy to understand as sequence identity.