Picking one representative from OrthoMCL cluster
1
0
Entering edit mode
7.3 years ago
Sourabh J • 0

Hi all, I have clustered around 30000 genes from roughly 90 genomes into orthologous groups by using orthoMCL. I want to check the evolutionary force guiding particular orthologous cluster. For that, I have selected longest sequence from each orthologus cluster (to increase phylogenetic spread) to be representative of that group and queried it in blast nr database to find homologs. My question is that if it is the right method to to find potential homologs by choosing one representative sequence (longest) from each group? or is there any other way by which we can choose a representative member of orthologous cluster?

Thanks in advance

regards

blast genome sequence • 2.1k views
ADD COMMENT
0
Entering edit mode

What exactly are you trying to do - why do you need to do that? The fact that you have a sequence in an orthologous group means that they should in theory at least all be as representative as one another..else they aren't really homologs right?!

You could use CD-HIT to cluster the ortholog groups and let it pick a representative sequence for you, but I'm still not 100% sure what the objective is here?

ADD REPLY
0
Entering edit mode

thanks for answer my goal is to prepare phylogenetic tree for each ortholog group by finding their homologs within nr database. for eg. if i have 50 genes in one OG, than blasting each gene individually in nr databsae to find homolog will be tedious and time consuming. SO, for that I want to select one representative member of each OG and blast it in nr.

ADD REPLY
0
Entering edit mode

Well strictly speaking it shouldn't really matter what ortholog from any given group you pick then, because if they are all sufficiently similar to one another (depends on what your ID cutoff was for inclusion as an ortholog) you'd expect that blasting any given ortholog in that group should return the same blast hits.

I have no idea whether you would expect to see much difference in picking the longest or shortest sequence in an ortholog group to blast though. I guess longer might hit more sequences in total. I like Jean-Karim's suggestion of making HMMs from the clusters though, so you encapsulate all their information.

ADD REPLY
0
Entering edit mode

Thanks for answering and removing my doubts

ADD REPLY
2
Entering edit mode
7.3 years ago

Instead of using one representative sequence, I think using an HMM to represent each cluster would be more robust. You can do this with hhmbuild and hmmsearch from HMMER.

ADD COMMENT
0
Entering edit mode

Thanks jean-karim. This will surely help.

ADD REPLY

Login before adding your answer.

Traffic: 2439 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6