What is the fastest way and software to build phylogenetic trees from WGS NGS data? I tried to use GeneiousPro software, but it is deadly slow, infinity/eternity is not enough to process my 179 WGS tuberculosis files. Thank you.
When it comes to phylogenetics, fastest isn't always the best.
If you have WGS data, you need to assemble the data/call variants. Most programs take alignments of one or more loci in FASTA, Nexus, or PHYLIP. You'll need homologous loci across samples.
If you have such a dataset, your workflow would look something like the following. Suggestions for programs in parentheses.
- Perform a multiple sequence alignment on your locus/loci of interest (Mafft, Muscle, Clustal)
- Estimate a model of sequence evolution for each of your alignments (DT-ModSel, ModelTest, a couple R libraries)
- If you are using genetic distances, correct the distances using the model you inferred (several libraries in R, PAUP*). For ML and Bayesian (likelihood) approaches, estimate trees using the inferred model (Garli, BEAST, RAxML, PAUP*)
- Evaluate the trees using some measure of statistical support, most often parametric/nonparametric bootstraps or posterior probabilities (often, the program you used)
For a lot of data, you are generally going to be constrained to using distance-based approaches or RAxML. Other approaches will take far too long to complete.
Phylogenetics is one of the more complicated subfields of biology (actually, it's an extension of graph theory in mathematics), and a thorough discussion of methods and theory is beyond a simple post. I hope this gives you a decent place to start.
This should also provide some answers to your other post: Is it possible to build phylogenetic trees from a set of fasta and gene bank format files, and what are the most appropriate tools for that?.