Question

How to tell the best model for ML analysis using RaxML?

0

Entering edit mode

6.1 years ago

MAPK ★ 2.1k

I was running model test for protein sequences using RaxML with the command raxmlHPC-PTHREADS -T 20 -m PROTGAMMAAUTO -s aligned_seq.fas -p 12345 -n T12 . I have got the output file with the following information below, but couldn't exactly tell what information there tells me about the best model to use.

output from model test:

This is RAxML version 8.2.4 released by Alexandros Stamatakis on October 02 2015.

With greatly appreciated code contributions by:
Andre Aberer      (HITS)
Simon Berger      (HITS)
Alexey Kozlov     (HITS)
Kassian Kobert    (HITS)
David Dao         (KIT and HITS)
Nick Pattengale   (Sandia)
Wayne Pfeiffer    (SDSC)
Akifumi S. Tanabe (NRIFS)

Alignment has 9777 distinct alignment patterns

Proportion of gaps and completely undetermined characters in this alignment: 57.69%

RAxML rapid hill-climbing mode

Using 1 distinct models/data partitions with joint branch length optimization


Executing 1 inferences on the original alignment using 1 distinct randomized MP trees

All free model parameters will be estimated by RAxML
GAMMA model of rate heteorgeneity, ML estimate of alpha-parameter

GAMMA Model parameters will be estimated up to an accuracy of 0.1000000000 Log Likelihood units

Partition: 0
Alignment Patterns: 9777
Name: No Name Provided
DataType: AA
Substitution Matrix: AUTO
Using fixed base frequencies




RAxML was called as follows:

raxmlHPC-PTHREADS -T 20 -m PROTGAMMAAUTO -s aligned_seq.fas -p 12345 -n T12 


Partition: 0 with name: No Name Provided
Base frequencies: 0.087 0.044 0.039 0.057 0.019 0.037 0.058 0.083 0.024 0.048 0.086 0.062 0.020 0.038 0.046 0.070 0.061 0.014 0.035 0.071 

Automatic protein model assignment algorithm using ML criterion:

    Partition: 0 best-scoring AA model: LG likelihood -376325.855350 with empirical base frequencies


Automatic protein model assignment algorithm using ML criterion:

    Partition: 0 best-scoring AA model: LG likelihood -375595.869217 with empirical base frequencies


Automatic protein model assignment algorithm using ML criterion:

    Partition: 0 best-scoring AA model: LG likelihood -375417.765314 with empirical base frequencies


Automatic protein model assignment algorithm using ML criterion:

    Partition: 0 best-scoring AA model: LG likelihood -375406.857988 with empirical base frequencies


Automatic protein model assignment algorithm using ML criterion:

    Partition: 0 best-scoring AA model: LG likelihood -375220.023642 with empirical base frequencies


Automatic protein model assignment algorithm using ML criterion:

    Partition: 0 best-scoring AA model: LG likelihood -375138.866616 with empirical base frequencies


Inference[0]: Time 3521.571673 GAMMA-based likelihood -375138.843356, best rearrangement setting 10
alpha[0]: 2.652696 


Conducting final model optimizations on all 1 trees under GAMMA-based models ....

Automatic protein model assignment algorithm using ML criterion:

    Partition: 0 best-scoring AA model: LG likelihood -375138.843357 with empirical base frequencies


Inference[0] final GAMMA-based Likelihood: -375138.842244 tree written to file /media/owner/raxml/RAxML_result.T12


Starting final GAMMA-based thorough Optimization on tree 0 likelihood -375138.842244 .... 

Final GAMMA-based Score of best tree -375138.842244

Program execution info written to /media/owner/raxml/RAxML_info.T12
Best-scoring ML tree written to: /media/owner/raxml/RAxML_bestTree.T12

Overall execution time: 4114.863478 secs or 1.143018 hours or 0.047626 days

Raxml modeltest phylogenetics • 6.0k views

ADD COMMENT • link updated 6.1 years ago by Michael 54k • written 6.1 years ago by MAPK ★ 2.1k

score 3 · Accepted Answer · 2018-06-10

Hi again,

LG likelihood -376325.855350 with empirical base frequencies

Given that it always output this, it looks like the automatic choice is Gamma + LG + F (empirical frequencies) (without invariant sites (I)) or PROTGAMMALGF in Raxml. Btw. you have a large proportion of gaps, over 50%. Probably, some editing of the alignment could improve it?

You could run prottest3 for a comparison, it will also allow to rank the models by different criteria (BIC, AIC) and see the ranks of all models.