Question: Which identity to apply when analyzing with BLASTp?
2.5 years ago
wrote:

Hello, I'm using the DIAMOND (Link) program to align my protein sequences against the CAZY database (Link, also protein). The question is: what identity and p-value should I apply to have a good alignment? I am afraid to put high values ​​and discard results that would be satisfactory. As well as, I'm afraid to put low values ​​and get incorrect results.

My set of protein sequences are derived from Shotgun sequencing and have a minimum size of 21 amino acids and maximum of 96 amino acids.

Thank you in advance!

Since you seem to be interested in enzymes and you are searching against a specific database hits that you get should all be relatively good (generally in blast E values below 1e-06 or better indicate reasonably strong identities). Hits identified with longer (96 AA) should be more reliable than the shorter (since some could just be identifying domains etc).

What is your ultimate goal here?

