I have a set of proteins and I need to search homologous partners for them. I wanted to automate the process of searching and I wrote a perl script for that. Now, the question is what should be the % of identity (both min and max) should I use in searching the homologous sequences. Also, the database that I using for blastp is PDB , since I need the structures only .. Kindly help out with this problem ??
I recalls that 30% is an empirical cutoff in term of protein sequence similarity.
If you use BLAST, then E-value serves as a better indicator of homology, comparing to identity. Because E-value takes into account the lengths of query and subject sequences. For example, a short protein is more likely to be somehow similar to another random guy simply by chance, in which case, a high E-value speaks stronger than a high identity. As I know there's no standard cutoff for E-value. You can try from 1e-2 (IMG's protocol) to 1e-10 or even lower.
On the other hand, there are a bunch of programs to help you identify orthologs, using more sophisticated algorithms, such as OrthoMCL. You can try those...