Question

ML tree inferred from multiple genes

0

Entering edit mode

12 months ago

Lada ▴ 30

Hi guys,

I am working with a directory of approximately 3,000 trimmed protein alignment files. Every alignment is one single copy orthologue containing sequences of 7 species obtained by Orthofinder. In fasta headers, I have species name_geneID.

I want to calculate the best substitution models and then infer a Maximum Likelihood SPECIES tree (so 7 branches) based on these 3000 genes.
I am doing this for the first time so I need some help, please.

What is in general the best approach to do that: concatenate all alignments and then work on that OR get 3000 gene trees (with the best substitution model taken into account for each) and then concatenate those somehow into a species tree (with IQtree or RAxML)?

The very final goal is to make time calibrated species tree in PAML (I have some divergence time points already) and then use that for subsequent evolutionary genomic studies (dN/dS etc.... ).

So far, I used IQTree to (hopefully) calculate substitution models for every alignment with the -p flag (calling the folder with alignments) since this is written under the subsection „Inferring species trees“:

iqtree-mpi -nt $NSLOTS  -p trimmed --prefix concat -wca -B 1000

I have a couple of questions:

Is this code the one that I need ? Is iqtree here calculating the best substitution model for each alignment? I am a little bit confused with the term of partitioning schemes...

I got a file called concat_best_model_nex with partition information. Are those the best substitution models for every gene (alignment) ?
I got a concat.tree file but this is not a species tree where I expected to get 7 branches (one branch per species) but I got a big tree with all fasta sequences from every alignments (orthologue genes). I thought my fasta headers were wrong so I flipped it to geneID_species name and got the same.

Note: I am testing the workflow on 8 alignments/genes only, just to be faster, but the result should be the same.

Tnx!

Lada

phylogeny iqtree ML phylotranscriptomics orthologues • 717 views

ADD COMMENT • link 12 months ago by Lada ▴ 30

score 1 · Answer 1 · 2023-09-04

1

Entering edit mode

12 months ago

Joe 21k

I would treat each individual set of orthologues/genes separately. Create a gene tree for each (with whatever method appeals).

You can then use these gene trees with tools like ASTRAL-II (https://academic.oup.com/bioinformatics/article/31/12/i44/2155240) to compute the overall topology of the species/genera.