How To Build A Tree Out Of 10000 Mitochondrial Sequences.
2
1
Entering edit mode
12.1 years ago
Mingkun ▴ 40

I have downloaded ~10000 human mitochondrial genomes (16.5Kb) from phylotree.org (small number of the sequences are incomplete), I want to make a phylogenetic tree and infer the most recent common ancestor sequence as well as knowing where the mutations occur on the tree (I want to collect the mutations that occurred many times on the tree). My question is which software can help me to get these information: 1) a tree from ~10000 sequences 2)the most recent common ancestor sequence 3)where the mutations occur on the tree. Thanks for your help.

mitochondria phylogenetics tree • 3.3k views
ADD COMMENT
4
Entering edit mode
12.1 years ago

Phylogeny: You can try the recent Clustal Omega: they claim that you can manage large amount of sequences. Or MAFFT. Recent ancestor; There are many, not really a clue what is good. Mutations: I am not sure what the question is. You want to know how often they occurred independently? You cannot determine this for real, you can if you assume parsimony. I am not sure there really is a software for that. Biologically a simple question but mathematically rather difficult to solve. Probably you need to write a script that searches your tree for the mutation, at every node it occurs score and stop searching the node. This does imply that your ancestral sequences are of high fidelity and I believe that is impossible.

Using MP it might not be too difficult.

ADD COMMENT
0
Entering edit mode

Concerning the mutation question, your understanding is right. I know Clustal and MAFFT could do the multiple alignment, but I don't know how to get the tree file (I am using the MAFFT now, but didn't find any option to output the tree file) :(

ADD REPLY
2
Entering edit mode
12.1 years ago

we have faced exactly the same necessity (using phylotree information for our own phylogenic purposes), although we have chosen a completely different approach. it's true that the fasta sequences are available at phylotree.org, but since they've already done the huge effort of placing them into a well stablished phylogenetic tree, we have decided not to waste that information but to use it on our side.

keep in mind that mtDNA sequences and their mutations have been heavily studied, and there is really no point in re-building the tree other than re-validating it. what we've done is to build up a fairly simple script which parses the excel file with the tree recording each mutation present on the tree plus its neighbors. this allows redrawing the tree from any particular point desired, hence allows reading the same tree using any particular mutation or branch (mt-MRCA, H2a2a, ...) as its origin. as an analogy, it would be like handing an octopus from one leg or from another: it will always be the same octopus, although you may be interested in different legs to get different ways of understanding its structure or morphology.

I know it's not getting you to a straight answer (such as citing a software package to use), but I hope that it opens your mind through what you really may need to do.

ADD COMMENT
0
Entering edit mode

Thanks for your answer, they only problem by using the "phylotree" tree is that many mutations were not included there, some of these missed mutations even have a rather high frequency in the population, then it is hard to say whether they have a high mutation rate or not. Do you have any idea to solve this problem?

ADD REPLY
0
Entering edit mode

Thanks for your answer, the only problem by using the "phylotree" tree is that many mutations were not included there, some of these missed mutations even have a rather high frequency in the population, then it is hard to say whether they have a high mutation rate or not. Do you have any idea to solve this problem?

ADD REPLY
0
Entering edit mode

the main idea under phylotree is to report all the known yet valid reported mutations in mtDNA, since great sources of error are always present on this field. confirming all new reported mtDNA variations is indeed a great effort, so the best suggestion I can give anyone is to work with the latest valid tree, which would include what has already been curated. if you still want to work with all the information (it would be like using beta software, like making inferences on dbSNP using ss ids instead rs ids) you would be on your own, and your conclusions would be difficult to confirm afterwards.

ADD REPLY
0
Entering edit mode

Thanks. I will just use the Phylotree then.

ADD REPLY
0
Entering edit mode

please consider that my advice is certainly biased, since I work in a research group with deep forensic tradition, where sources of data contamination are always scrutinized to remove any kind of error propagation on our results. as I said, when dealing with mtDNA variation information, using phylotree's tree is currently, in my honest opinion, certainly the best thing to do. hope you find it useful enough for your research.

ADD REPLY

Login before adding your answer.

Traffic: 2645 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6