Entering edit mode
10.3 years ago
JstRoRR
▴
60
Hi,
I am trying to create a substitution matrix for some bacterial species. To achieve this, as a preliminary step, I have performed a search for orthologs (protein sequences) using orthoMCL. The program outputs a group.txt file containing groups of sequence clustered on the basis of similarity. Now the next step would be to pair wise align orthologs and create a substitution matrix.
My question is, should I consider taking orthologs grouped by the program in the groups.txt file or should I consider paired orthologs present in pairs folder(orthologs.txt) in the output????
Be warned that OrthoMCL does not output the singleton clusters ie. some genes are unique.
Other tools you might want to consider are ProteinOrtho5 and kClust and cd-hit which can do similar things (and much faster).
Thanx Torst for providing alternate tools. I will go through them.
I have one more question, if you have any idea about, is there any tool available which can be used to create a substitution matrix from multiple alignments? Or I have to do it manually or write a fresh script for that?
OrthoMCL does provides a perl script to extract Singletons
OrthoMCL just gives the list of sequences not present in the groups file. but the accuracy of being a true singleton is skeptical, as these remaining sequences may have some orthologs within them.