I need to find out the percentage of identity between every pair of orthologous genes in 4 different (but closely related) bacteria.
The dataset I have is the nucleotide sequences of each gene (I mean, ORFs) in the genomes and the information on which gene is orthologous to which (based on OrthoMCL result). There are ~1500 orthologous groups, so at the end I hope to have ~1500 tables which show percentage of identity among the genes in each group. Well, even better is to have ~1500 identity percentage ranges since these are what I'm really after.
Is there a software to do this? (Sorry but I haven't searched for it myself since I don't even know what to search for.)
If such software doesn't exist, I'm thinking to build one myself since I'm learning Python. Any suggestion for that? I'm thinking to use global alignment algorithm like Needleman–Wunsch's, and preferably using Windows (since this is the only available option for me; but please don't hesitate to answer if you have a Linux solution).
(Edited to explain OS choice)