Question: Maximum Parsimony (and the Median Problem) VS Maximum Likelihood for phylogenies construction
0
18 months ago by
Lucas Peres80
Brazil, Belém
Lucas Peres80 wrote:

TL;DR: What's the point of doing research in Maximum Parsimony based methods for phylogeny construction if the Maximum Likelihood based methods became the standard for some time and, as far as I know, yield much better and consistent results? Are there situations where Maximum Parsimony based methods are better? What's the point of doing research on rearrangement events in isolation when there are more robust models? What's the biological relevance? Could you point me to some papers that make comparisons, give a broad range of applications of each method, etc?

Hey, so I'm an undergrad in computer science writing a dissertation about the DCJ (Double Cut and Join) Median Problem.

For those who don't know, the DCJ is a model to represent various rearrangements operations (reversals, transpositions, etc) in a genome. It's a robust model to calculate the edit distance between two genomes, but as far as I know, it doesn't include indels and duplicate genes, although there are papers that try to achieve this. So, here is the problem statement: given genomes A, B and C, find a 4th genome D such that d(A,D) + d(B,D) + d(C,D) is minimized, where d is the DCJ distance between two given genomes. It's known that this problem is NP-Hard, so the algorithms to solve it are approximate, heuristics, and so on.

You may have noticed by now that this problem falls under the umbrella of Maximum Parsimony based methods. I wanted to give you a brief overview so you could see the big picture. I have been doing some bibliographical research to start my dissertation, but after I came across broader papers and talked to some biologist colleagues, it seems that Maximum Likelihood based methods are the standard in phylogenetics. Colleagues on the lab I work on use programs like MrBayes and RAxML in a daily basis. What I could read by now is that these methods are more consistent, realistic, etc.

So I started to ponder, what's the point of doing research on Maximum Parsimony based methods at all? What are the advantages? Are they worth it? I have seen many papers that treat some rearrangement events in isolation, like sorting by reversals, sorting by transpositions, etc. Why do research in these events in isolation if there are models like the DCJ and others (HP, SCJ, etc)? What's the biological relevance? Could you point me to some papers that make comparisons, show where one can be used over the other, etc? Sorry if this is a naive question (it probably is), but I'm trying to find a justification for my work. Anyways, thoughts? :)

modified 18 months ago by Biostar ♦♦ 20 • written 18 months ago by Lucas Peres80
1

ML is popular for actual phylogenetic reconstruction. Essentially you’re asking what might the sequence have looked like at the deeper nodes of the tree, and using likelihood to ‘simulate it’.

If you don’t care about actual ancestry, but instead only care about demonstrating similarity, then distance and parsimony methods might be more relevant. This essentially operates on the assumption that 2 similar sequences (or less distant), are most parsimoniously explained if they’re recent ancestors of one another. There’s no requirement for this to be the biological ground truth however, because lateral gene transfer, convergent evolution and so on, muddy the waters.