Question

What is the importance of partitions in phylogenomics analysis ?

0

Entering edit mode

2.0 years ago

sunnykevin97 ▴ 980

Hi,

I estimated a phylogeny tree, using the partial mitogenomes from around 100 taxa, using both the methods, maximum likelihood (RAXML) and the bayesian inference (BEAST2).

The topology remains the same in both the methods. I did it without partitions of mitogenomes., And, further, performed the selection analysis using PAML for the mitogenomes

and calculated the dN\dS substitutions by pairwise. I noticed that some of the mitogenomes were under negative selection and some of them are positive selection., My

question is, how do I know which genes in the mitogenomes are under selection ? Is it important to perform the analysis using partitions ?

Is their any tools, which can split the multiple sequence alignment file in to partitions file.

Suggestions please.

gene genome • 1.1k views

ADD COMMENT • link updated 15 months ago by Dave Carlson ★ 1.7k • written 2.0 years ago by sunnykevin97 ▴ 980

score 2 · Answer 1 · 2022-04-21

2

Entering edit mode

2.0 years ago

Mensur Dlakic ★ 27k

I am answering only some of the questions.

Partitions define parts of the sequence alignment that (potentially) change under different evolutionary models than the rest. There is no software that can define the regions for you - they are known a priori.

Let's say that you are concatenating alignments of protein A (nuclear), B (cytoplasmic), and C (mitochondrial). Let's assume that in the concatenated alignment protein A is residues 1-200, protein B is 200-300, and protein C is 300-500. Those are your partitions, where those parts of the alignment can be reconstructed under different substitution models.

Given all the possible partitions, IQ-TREE's model finder will test what model works best for each partition, and subsequently the reconstructions can be done for each partition with the model that is most appropriate. That would look something like this:

begin sets;
  charset part1_part4_part5_part6_part7_part8_part10_part14 = 1-118  479-661  662-831  832-933  934-1083  1084-1197  1375-1463  1862-1955;
  charset part2_part11_part15 = 119-217  1464-1584  1956-2044;
  charset part3_part9_part12_part13_part16 = 218-478  1198-1374  1585-1663  1664-1861  2045-2174;
  charpartition mymodels =
    LG+F+I+G4: part1_part4_part5_part6_part7_part8_part10_part14,
    LG+G4: part2_part11_part15,
    LG+I+G4: part3_part9_part12_part13_part16;
end;

ADD COMMENT • link 2.0 years ago by Mensur Dlakic ★ 27k

0

Entering edit mode

Alright,

Making a partition for a few samples is doable, manually. How about ~100 taxa ?

For genomes it was straightforward to predict orthologs and then genetic inference. For mitogenomes it was a little difficult.

ADD REPLY • link 2.0 years ago by sunnykevin97 ▴ 980

0

Entering edit mode

Partitions are done by proteins/genes (by individual sequence alignments), not by taxa. I don't know of any way to delineate them except manually. That means if I have a protein A with 5000 taxa, I make an alignment of all those sequences and count the number of aligned columns. Let's say there are 200 columns in the alignment of protein A, and another 150 in protein B. When I concatenate them into a single alignment, which is now 350 residues wide (and 5000 taxa long), my two partitions will be 1-200 and 201-350. Doesn't matter how many taxa I have, as the partitions are defined by individual sequences that make up a complex alignment.

ADD REPLY • link 2.0 years ago by Mensur Dlakic ★ 27k

2

Entering edit mode

Just to add to this - one convenient option for automated partition generation is the pxcat program in Phyx tools suite:

https://github.com/FePhyFoFum/phyx

ADD REPLY • link 15 months ago by Dave Carlson ★ 1.7k

0

Entering edit mode

In the example above there is an alignment that is 2174 residues wide, and it is divided into 16 partitions - one for each protein that were concatenated together. It shows you that partitions #2, #11 and #15 are considered under the same evolutionary model (LG substitution matrix, 4 categories of the gamma-shaped rate distribution), while other partitions fall with different matrix and gamma combinations.

ADD REPLY • link 2.0 years ago by Mensur Dlakic ★ 27k