Phylogenetic Tree from Massive Multifasta Alignment?
1
0
Entering edit mode
9 months ago
jdru • 0

Hi all,

I have a very large (~30,000 sequence, each ~17000 bases) multifasta alignment and I am wondering if this is too large to construct a phylogenetic tree? If not, which program would be most appropriate for this use case?

Thank you!

tree alignment fasta phylogeny • 874 views
0
Entering edit mode

How was the multifasta generated? Generally I would be very skeptical of the quality of any MSA of that size. Most tools break down long before that.

0
Entering edit mode

It was generated with MAFFT. I agree, the construction of the tree is actually part of post-processing/quality checking

0
Entering edit mode

I would suggest using RAxML-NG or iqtree. I believe that iqtree is faster than RAxML though.

1
Entering edit mode

Unless OP has thousands of cores, I think he would be better off with e.g. fasttree

0
Entering edit mode

IIRC iqtree has a fast mode which performs comparatively to fasttree

0
Entering edit mode

Just curious: any reason you have and use two accounts?

0
Entering edit mode

Oh sorry, I forgot I had already made an account this summer to ask a question (before getting my DTU email). I will go delete the old one.

1
Entering edit mode
9 months ago
Mensur Dlakic ★ 20k

Unless you are starting a new classification (new tree of life?) or building some sort of public database, 30K sequences is completely unnecessary. For just about any other purpose I can think of, that many sequences is an overkill. For publications or for grants, it is not practical to inspect trees that have more than few hundred branches, and even those would have to be collapsed into groups.

Your purpose for doing this aside, it will be difficult to get this tree to converge. With IQ-TREE in the fast bootstrap mode (a minimum of 1000 bootstraps which may not be enough for you) and 20-40 CPUs, it takes half a day for a protein alignment of ~150 sequences that are ~15,000 residues each. This may give you some idea about the time needed when you scale it up to what you have - and I don't think it scales up linearly.

If you still want to do it, you may want to give this a look:

https://cme.h-its.org/exelixis/web/software/examl/index.html