Question: Use Conservation Or Evolutionary Rate To Infer Functional Relevance Of Aminoacid Positions?
gravatar for miquelduranfrigola
6.3 years ago by
miquelduranfrigola760 wrote:

Hi all,

I hope this question makes sense. I have a multiple sequence alignment and I'd like to propose positions that could have a functional relevance. Some positions are fully, or almost fully, conserved and I can quantify the conservation by means of Shannon entropies.

However, I've found out that some people is measuring evolutionary rates of positions, for example with Rate4Site. Here, slowly evolving positions are thought to be relevant. To what extend this notion is different from residue conservation? If I want to find residues that may have functional or structural relevance, what would you guys measure?


conservation • 2.6k views
ADD COMMENTlink modified 3.6 years ago by Biostar ♦♦ 20 • written 6.3 years ago by miquelduranfrigola760

Ultimately both methods are trying to give you an idea of functional relevance through primary sequence alignment. You are using information content to quantify conservation while Rate4Site first makes a tree and then use that information to calculate conservation.

How exactly are you going to use Shannon's entropies to ascertain functional relevance? Just anything with low entropy is conserved? What is your threshold and how did you come up with that?

ADD REPLYlink written 6.3 years ago by Damian Kao15k

While in theory methods that take into account the evolutionary tree should outperform information-theory based approaches, Rate4Site has not been shown to outperform those significantly. For using SE (+variants) as a conservation score, there are numerous publications showing decent performance (see my post below).

ADD REPLYlink written 6.3 years ago by Michael Schubert6.8k

Rate4Site is ok, but it is only calculating the site rate, and with a rather limited model of evolution at that. I would also point out that the Capra method, while primarily an Information Theoretic measure, does I believe, attempt to weight it's conservation score based on the distance between sequences. So it is attempting to weight based on the amount of diversity within and between the samples.

I would also take all of the performance measures with a grain of salt. In my experience the selection of datasets to test on is highly biased and often downright uninformative. While I was working on predicting functional divergence and shifts in functional importance, I explored this issue a little bit late last year:

ADD REPLYlink written 6.3 years ago by Dan Gaston7.1k

Admittedly, my naive approach was to rank by entropy and manually explore the top-ranking positions (the "more conserved") for functional or structural rationale.

ADD REPLYlink written 6.3 years ago by miquelduranfrigola760
gravatar for Michael Schubert
6.3 years ago by
Cambridge, UK
Michael Schubert6.8k wrote:

Reading material:

  • Prediction of protein functional residues from sequence by probability density estimation, Fischer et al. (2008), Bioinformatics [html]: general design principles of a conservation score and ways to improve it

  • Predicting Protein Ligand Binding Sites by Combining Evolutionary Sequence Conservation and 3D Structure, Capra et al. (2009) [html]: the effects of structure on the prediction of ligand binding residues

  • Automatic prediction of catalytic residues by modeling residue structural neighborhood, Cilia & Passerini (2010) [html]: same as above, but much more fine-grained and using machine learning (SVMs)

  • Prediction of catalytic residues using Support Vector Machine with selected protein sequence and structural properties, Petrova & Wu (2006) [html]: how to use machine learning for identification of catalytic residues with a conservation score only as an input

Bear in mind that I worked on this 2 years ago, there will be more up-to-date literature. Nevertheless, I found these articles extremely useful as an introduction.

ADD COMMENTlink written 6.3 years ago by Michael Schubert6.8k

Great stuff! Thx Michael.

ADD REPLYlink written 6.3 years ago by miquelduranfrigola760

All great references to read BTW

ADD REPLYlink written 6.3 years ago by Dan Gaston7.1k
gravatar for Dan Gaston
6.3 years ago by
Dan Gaston7.1k
Dan Gaston7.1k wrote:

I would advocate using a program that doesn't just look at how conserved a site is using information theoretic measures, but that also incorporates the phylogeny of the sequences. The Evolutionary Trace program does this, and the most recent versions have a hybrid scoring approach that includes both evolutionary importance and a modified shannon entropy measure. This is important because your species selection/biological sampling makes a difference, and most Information Theory-based measurements don't account for the relatedness of your samples in even a rudimentary way.

ADD COMMENTlink written 6.3 years ago by Dan Gaston7.1k

Nice. As usual, thanks for the assertive advice. I'll give Evolutionary Trace a try.

ADD REPLYlink written 6.3 years ago by miquelduranfrigola760
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1763 users visited in the last hour