Clustering amino acids sequences to detect homologies
0
0
Entering edit mode
5.1 years ago
ddowlin ▴ 70

Hi all,

Basic question: I am interested in clustering a group of amino acid sequences into clusters reflecting evolutionary relationships.

I have a set of about 40 amino acid sequences from four yeast species. I want to know if there are any homologs (either orthologs between species or paralogs within a species) among the 40 sequences. The 40 sequences include 4 sequences (one from each species) which I identified as orthologs using pHMMER. Additionally I added three known mammal orthologs as a control.

I as advised to use Clustal Omega to align the sequences and then identify the clusters by the resulting cladogram. However, I am unsure how valid this method is if multiple non-homologous sequences are used. How can we trust the resulting MSA or any phylogeny based on it?

I used four aligners (CLustal Omega, MAFFT, t-coffee, and Muscle). Each gives a different tree topology, although the three mammal sequences cluster in all four and the four yeast homologs cluster in two trees.

I have also tried CD-Hit (using lowest sequence identity threshold of 0.3). With this method the only clusters identified are the three mammal sequences.

tl;dr Any advice or suggestions for

protein amino acid clustering MSA • 1.5k views
ADD COMMENT

Login before adding your answer.

Traffic: 2907 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6