I am trying to create a substitution matrix for some bacterial species. To achieve this, as a preliminary step, I have performed a search for orthologs (protein sequences) using orthoMCL. The program outputs a group.txt file containing groups of sequence clustered on the basis of similarity. Now the next step would be to pair wise align orthologs and create a substitution matrix.
My question is, should I consider taking orthologs grouped by the program in the groups.txt file or should I consider paired orthologs present in pairs folder(orthologs.txt) in the output????