Clustering amino acids sequences to detect homologies
Entering edit mode
5.1 years ago
ddowlin ▴ 70

Hi all,

Basic question: I am interested in clustering a group of amino acid sequences into clusters reflecting evolutionary relationships.

I have a set of about 40 amino acid sequences from four yeast species. I want to know if there are any homologs (either orthologs between species or paralogs within a species) among the 40 sequences. The 40 sequences include 4 sequences (one from each species) which I identified as orthologs using pHMMER. Additionally I added three known mammal orthologs as a control.

I as advised to use Clustal Omega to align the sequences and then identify the clusters by the resulting cladogram. However, I am unsure how valid this method is if multiple non-homologous sequences are used. How can we trust the resulting MSA or any phylogeny based on it?

I used four aligners (CLustal Omega, MAFFT, t-coffee, and Muscle). Each gives a different tree topology, although the three mammal sequences cluster in all four and the four yeast homologs cluster in two trees.

I have also tried CD-Hit (using lowest sequence identity threshold of 0.3). With this method the only clusters identified are the three mammal sequences.

tl;dr Any advice or suggestions for

protein amino acid clustering MSA • 1.5k views

Login before adding your answer.

Traffic: 2907 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6