What Algorithms are Needed in the Sequencing Community?
Entering edit mode
3.2 years ago
beneopp • 0

Hi, I am a bioinformatics researcher in a team that is thinking of coming up with an algorithm which would find the appropriate matrix to use in attempting to annotate a protein sequence taking into consideration the difference of years in evolution between the species of the query sequence and the species of the reference sequences. Would that be helpful? If not, what would be helpful. I am new to the field so if people can recommend articles to look at, that would be helpful. Thanks..

alignment sequencing • 387 views
Entering edit mode

If I'm understanding correctly, you want to devise an 'evolution-sensitive' substitution matrix or something similar?

On the face of it that sounds interesting and useful, however, defining evolutionary relationships between 2 arbitrary sequences is non-trivial. One might imagine using a molecular clock for instance, but they are notoriously difficult to calibrate properly and wholly dependent on the data available.

I'm not sure if this will be especially useful for annotation, since methods like HMMs already provide pretty good homologous information for divergent sequences. I can see it potentially being useful in sequence alignment though. Current substitution matrices, e.g. for amino acids, basically just factor in amino acid biochemical similarity, based on the idea that nature is likely to try and conserve function for which biochemical properties are key (or at least, its likely to purge the ones that don't).

If you could determine a matrix, based on some solid guess of the truth of the evolution of this particular genetic environment, one might be able to say "in bacteria X, leucine is commonly mutated to alanine (or whatever) whereas, in a normal matrix the common substitution would be isoleucine". This will almost certainly need calibrating on a per-species/strain basis though. I also don't know whether you're going to see an 'evolutionary signal bias' that simply outweighs the biochemical property preservation that existing matrixes are based on, so you could end up doing a lot of work for no improvement.

This is also only going to be applicable to genomes which are well studied already, such that you could even begin to create an 'evolution matrix', ideally with some ancestral/ancient samples.


Login before adding your answer.

Traffic: 2117 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6