I have a file of about 600 translated protein sequences, which I'd like to name automatically based on their homology. If I blast them against the nr database, the hits they return often don't have consistently annotated names, for example one protein could hit something named 'NADH-ubiquinone reductase' and another 'complex 1 dehydrogenase', which are the same protein (KEGG: http://www.genome.jp/dbget-bin/www_bget?ec:220.127.116.11) but have been named differently depending on who submitted it to ncbi.
Is there some kind of 'official' name that can be assigned for proteins, then I can quickly screen through them and see if a particular protein is present or not, or how many copies of that particular protein there are, for example. At the moment, it's difficult to know whether my searches are exhaustive as I might be looking for a protein under one name but it's called something completely different.
Someone suggested to me assigning each protein to groups of orthologs, for example using something like OrthoMCL or EggNOG but I'm struggling to understand how to use these, especially for several hundred sequences.
If anyone could suggest a strategy or give some indication of how to use these ortholog database for the purpose I've described, I'd greatly appreciate the help. Cheers!