Question

Use Conservation Or Evolutionary Rate To Infer Functional Relevance Of Aminoacid Positions?

3

Entering edit mode

11.7 years ago

miquelduranfrigola ▴ 780

Hi all,

I hope this question makes sense. I have a multiple sequence alignment and I'd like to propose positions that could have a functional relevance. Some positions are fully, or almost fully, conserved and I can quantify the conservation by means of Shannon entropies.

However, I've found out that some people is measuring evolutionary rates of positions, for example with Rate4Site. Here, slowly evolving positions are thought to be relevant. To what extend this notion is different from residue conservation? If I want to find residues that may have functional or structural relevance, what would you guys measure?

Thanks!

conservation • 3.9k views

ADD COMMENT • link updated 9.0 years ago by Biostar 20 • written 11.7 years ago by miquelduranfrigola ▴ 780

3

Entering edit mode

Ultimately both methods are trying to give you an idea of functional relevance through primary sequence alignment. You are using information content to quantify conservation while Rate4Site first makes a tree and then use that information to calculate conservation.

How exactly are you going to use Shannon's entropies to ascertain functional relevance? Just anything with low entropy is conserved? What is your threshold and how did you come up with that?

ADD REPLY • link 11.7 years ago by Damian Kao 16k

3

Entering edit mode

While in theory methods that take into account the evolutionary tree should outperform information-theory based approaches, Rate4Site has not been shown to outperform those significantly. For using SE (+variants) as a conservation score, there are numerous publications showing decent performance (see my post below).

ADD REPLY • link 11.7 years ago by Michael Schubert ★ 7.1k

3

Entering edit mode

Rate4Site is ok, but it is only calculating the site rate, and with a rather limited model of evolution at that. I would also point out that the Capra method, while primarily an Information Theoretic measure, does I believe, attempt to weight it's conservation score based on the distance between sequences. So it is attempting to weight based on the amount of diversity within and between the samples.

I would also take all of the performance measures with a grain of salt. In my experience the selection of datasets to test on is highly biased and often downright uninformative. While I was working on predicting functional divergence and shifts in functional importance, I explored this issue a little bit late last year:

http://www.ncbi.nlm.nih.gov/pubmed/21840876

ADD REPLY • link 11.7 years ago by DG 7.3k

1

Entering edit mode

Admittedly, my naive approach was to rank by entropy and manually explore the top-ranking positions (the "more conserved") for functional or structural rationale.

ADD REPLY • link 11.7 years ago by miquelduranfrigola ▴ 780

Ram · Answer 1 · 2012-08-13

Reading material:

Prediction of protein functional residues from sequence by probability density estimation, Fischer et al. (2008), Bioinformatics [html]: general design principles of a conservation score and ways to improve it
Predicting Protein Ligand Binding Sites by Combining Evolutionary Sequence Conservation and 3D Structure, Capra et al. (2009) [html]: the effects of structure on the prediction of ligand binding residues
Automatic prediction of catalytic residues by modeling residue structural neighborhood, Cilia & Passerini (2010) [html]: same as above, but much more fine-grained and using machine learning (SVMs)
Prediction of catalytic residues using Support Vector Machine with selected protein sequence and structural properties, Petrova & Wu (2006) [html]: how to use machine learning for identification of catalytic residues with a conservation score only as an input

Bear in mind that I worked on this 2 years ago, there will be more up-to-date literature. Nevertheless, I found these articles extremely useful as an introduction.

score 3 · Answer 2 · 2012-08-13

3

Entering edit mode

11.7 years ago

DG 7.3k

I would advocate using a program that doesn't just look at how conserved a site is using information theoretic measures, but that also incorporates the phylogeny of the sequences. The Evolutionary Trace program does this, and the most recent versions have a hybrid scoring approach that includes both evolutionary importance and a modified shannon entropy measure. This is important because your species selection/biological sampling makes a difference, and most Information Theory-based measurements don't account for the relatedness of your samples in even a rudimentary way.

ADD COMMENT • link 11.7 years ago by DG 7.3k

0

Entering edit mode

Nice. As usual, thanks for the assertive advice. I'll give Evolutionary Trace a try.

ADD REPLY • link 11.7 years ago by miquelduranfrigola ▴ 780