Question: Working Principle of Ancestral Sequence Reconstruction (ASR)
0
2.3 years ago by
johnnytam100100
johnnytam100100 wrote:

Hi, I am now trying to work on the ancestral sequence reconstruction of a protein of interest. I am trying to understand the working principle and have looked into related topics such as parsimony and maximum likelihood etc. to understand how do people deduce the ancestral node (i.e. the likely ancestral aa residue). Consider we have an aligned residue in a sequence like:

``````. . . A . . .
. . . A . . .
. . . A . . .
. . . A . . .
. . . A . . .
. . . G . . .
``````

I guess the best model (maximum likelihood) to describe the alignment with the assumption that evolution event is rare (parsimony) would be the ancestral node is "A".

Then I have a dumb question, what is the difference between just compare the % of the aa resiue and pick the aa residue with highest % and the actual procedure of deducing ancestral node with the intense computation with maximum likelihood method?

Thanks a lot!

modified 2.3 years ago by Brice Sarver3.6k • written 2.3 years ago by johnnytam100100
1
2.3 years ago by
Brice Sarver3.6k
United States
Brice Sarver3.6k wrote:

You're working with an alignment here, i.e., an inference of homology. As a result, there will be some underlying phylogenetic structure that you can infer which describes the relationships among the different lineages. You can also specify a model of sequence evolution that will inform your analysis (e.g., the G > A change could be very unlikely in your data for a variety of reasons).

Briefly, when calculating the single-site likelihood, you are mathematically incorporating not just the character states at the tips but also the (possible) character states at the nodes. This the heart of the question you're asking. So, under parsimony, you're correct that the simplest estimation of character state(s) for a node that will give rise to the observed character states will be A with a shift to G somewhere down the line. ML-based approaches take this a step further; you can think of it roughly as 'weighting' the possible states at the node, effectively capturing something about the evolutionary process. Imagine a slightly more complex scenario in an alignment where you have 6 As and 4 Gs. The maximum likelihood estimate of the ancestral state at this node will be a function of your input data and parameters related to the evolutionary process. You may be much more confident that the likely ancestral state is an A than the parsimony calculation which, in its simplest form, will either return an A or {A, G}, depending on how you're considering it. Having likelihood-based ancestral state estimates captures this for less obvious situations.

Thanks for the comment! So do you mean although sometimes the result approximate to picking the most frequent aa residue, the ML-based method calculate more intensively what should be the best guess of the ancestral residue, and gives better confidence in complicated situation e.g. 6As and 4Gs? Sorry though I do not completely understand your explanation but I am trying to grasps your idea.