I assembled two different genomes and wanted to see how similar they are on both nucleotide and protein levels so I aligned their nucleotide and translated nucleotide sequences. Here are the results I obtained:
Nucleotide identity=90% Protein identity=57%
How would one make sense of this high nucleotide yet low protein identity result? I have been doing a lot of reading and it seems that if the species are close its better to use the DNA sequence to compare, and I believe these two species should be fairly close. However, I am still confused as to why the values would differ so much.
Thanks for your input!
There are lots of reasons for this, and all else being equal this is to be expected.
You need to clarify whether these are DNA sequences of genes or the whole genome etc.
Sorry I should have clarified. Whole genomes!
It doesn't make any sense to translate the whole genome, and consequently even less to align/compare them.
Exactly!! Only translate and compare protein-coding regions. For non-coding regions, DNA similarity can be high but when ERRONEOUSLY translated, the "protein" sequences could be from different frames and therefore very low similarity. Again, only translate and compare protein-coding regions.