Question: Clustering amino acids sequences to detect homologies
gravatar for ddowlin
4.3 years ago by
ddowlin70 wrote:

Hi all,

Basic question: I am interested in clustering a group of amino acid sequences into clusters reflecting evolutionary relationships.

I have a set of about 40 amino acid sequences from four yeast species. I want to know if there are any homologs (either orthologs between species or paralogs within a species) among the 40 sequences. The 40 sequences include 4 sequences (one from each species) which I identified as orthologs using pHMMER. Additionally I added three known mammal orthologs as a control.

I as advised to use Clustal Omega to align the sequences and then identify the clusters by the resulting cladogram. However, I am unsure how valid this method is if multiple non-homologous sequences are used. How can we trust the resulting MSA or any phylogeny based on it?

I used four aligners (CLustal Omega, MAFFT, t-coffee, and Muscle). Each gives a different tree topology, although the three mammal sequences cluster in all four and the four yeast homologs cluster in two trees.

I have also tried CD-Hit (using lowest sequence identity threshold of 0.3). With this method the only clusters identified are the three mammal sequences.

tl;dr Any advice or suggestions for

clustering msa amino acid protein • 1.3k views
ADD COMMENTlink written 4.3 years ago by ddowlin70
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1073 users visited in the last hour