Question: Things to do with the clustered groups of proteins from orthoMCL tool???
gravatar for JstRoRR
5.9 years ago by
JstRoRR60 wrote:


I am trying to create a substitution matrix for some bacterial species. To achieve this, as a preliminary step, I have performed a search for orthologs (protein sequences) using orthoMCL. The program outputs a group.txt file containing groups of sequence clustered on the basis of similarity. Now the next step would be to pair wise align orthologs and create a substitution matrix. 

My question is, should I consider taking orthologs grouped by the program in the groups.txt file or should I consider paired orthologs present in pairs folder(orthologs.txt) in the output????



genome • 2.8k views
ADD COMMENTlink written 5.9 years ago by JstRoRR60

Be warned that OrthoMCL does not output the singleton clusters ie. some genes are unique.

Other tools you might want to consider are ProteinOrtho5 and kClust and cd-hit which can do similar things (and much faster).


ADD REPLYlink written 5.9 years ago by Torst950

Thanx Torst for providing alternate tools. I will go through them. 

I have one more question, if you have any idea about, is there any tool available which can be used to create a substitution matrix from multiple alignments??? Or I have to do it manually or write a fresh script for that??

ADD REPLYlink written 5.9 years ago by JstRoRR60

OrthoMCL does provides a perl script to extract Singletons

ADD REPLYlink written 5.7 years ago by amanjain0

OrthoMCL just gives the list of sequences not present in the groups file. but the accuracy of being a true singleton is skeptical, as these remaining sequences may have some orthologs within them.

ADD REPLYlink written 5.0 years ago by Nari880
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2210 users visited in the last hour