Help with phylogenomics
1
0
Entering edit mode
2.1 years ago
Space_Life ▴ 50

Hi, I recently did a comparative genomic analysis for hundreds of bacteria. I have nearly a hundred alignment files of the conserved amino acid and nucleotide sequences. There are overwhelming information and methods to perform phylogenetic analysis. I only have a basic idea about the phylogenetic tree, however, I want to make a good case out of this comparative genomic study by having a good evolutionary history analysis. I will welcome your suggestions that can be added in my study. Few things that I can think of are:

1. Build a phylogenetic tree using all the conserved sequence alignment files (I read about concatenating sequences and then build a tree, I tried this in Mega 11, it is running since yesterday on a 28 core High performance computer, is it usual? Any better option you would suggest? I read about IQ-tree with ultrafast bootstraping, is that faster option?)

2. Build a time tree ( can I use the concatenated sequence file data to make a time tree?)

3. I want to have evolutionary insights about specific genes (core fuctions) [ make phylogenetic tree, time tree with that specific gene sequence; for this one I want to be able to explain the notable variations (deletion, substitutions etc) happened in the course of evolution. What tool or type of phylogenetic tree should be able to do this?

There are so many tools I came across in the last two days, I think I got more confused. I have installed Mega 11 and IQ Tree both. Every tools has tons of customization and option. I want to set one pipeline and complete this analysis. I appreciate you reading my long questions and taking time to answer me. Thank you.

iqtree Phylogenetics Time-tree evolution Mega • 727 views
ADD COMMENT
2
Entering edit mode
2.1 years ago
Mensur Dlakic ★ 27k

I am making some assumptions based on how you described things. None of it is personal.

It seems like you have little to no experience on this subject. That's not a good start for your stated goal to make a good case out of this comparative genomic study by having a good evolutionary history analysis. Asking for help here is only slightly better, as it is unlikely that someone can guide you through all the steps that are needed to do this properly. To give you a basic idea, you will need to do most of these steps (and likely some more): 1) collect homologous sequences; 2) make alignments; 3) trim alignments; 4) concatenate alignments; 5) run phylogenetic tree reconstructions; 6) display and interpret the trees. It seems like you have some idea about the difficulty, but let me reiterate: each of these steps has a variety of programs that can be used, and it is easy to get tripped along the way (objective difficulties compounded by subjective errors). Even if you are willing to learn all the steps and do everything yourself, it is either not going to be fast, or given the problem scale it is not going to be done correctly. Most likely some combination of both. I suggest you find someone who knows how to do this, offer them authorship on the paper, and learn along the way.

If you still want to give it a go on your own, here is a pipeline:

https://github.com/AstrobioMike/GToTree

A couple of basic steps and my suggestions: use proteins rather than DNA, as they often recap relationships better. Protein sequence tends to be conserved better than DNA sequence due to codon degeneracy. If you have very related species or strains of the same species, DNA might do a better job. You will most likely need a set of single-copy markers as a starting point, and that can be found by Googling. There are different sets ranging from 16 conserved ribosomal proteins up to ~120 markers for domains of Archaea and Bacteria. For each of these species you will need to identify whether they have the markers, collect them, align, and in general follow the steps I described. The pipeline above will handle most of that automatically. For tree-building the two most often used approaches center around Maximum Likelihood (ML) or Bayesian estimation. The former are generally faster (IQTREE is that category) and the latter tend to be more comprehensive. I really can't explain this in any greater detail here because of topic's complexity, but it is not difficult to find more information by Googling.

To answer your questions briefly:

  1. You told us the height of your alignment (presumably hundreds of sequences matching your organisms) but not the width. If a concatenated alignment is in thousands or > 10000 residues, it could take a long time to get a tree. I don't want to pass a strong opinion without having time to defend it, but MEGA is considered a user-friendly tool for phylogenetic reconstructions and is not necessarily the most rigorous choice for research purposes. That's not to say that it isn't reliable, and in particular I have had very little interaction with the program since version 6.

  2. I assume you mean a dated tree here. Aside from whether this is something that will really advance your paper, I suggest you do basic tree reconstructions first.

  3. Pretty much the same answer as in #2. And if I understand correctly what you want here, there is no tool that automatically describes altered functionality based on insertions and deletions, especially if these changes are small (as is often the case).

Finally, a rather comprehensive list of programs performing various steps of phylogenetic analyses:

https://evolution.genetics.washington.edu/phylip/software.html

ADD COMMENT
0
Entering edit mode

Thank you Dr. Dlakic. I absolutely agree with you and I certainly needed some insights that I got from you. Getting into a detailed phylogenomics will certainly take time and it is probably better to involve someone with a good experience. I was actually thinking to add just a basic overview in my current paper as I already have enough data. As I read more, I started getting more involved and may be I will go just a little bit more in detail on this. The steps you mentioned are very helpful. I think I did upto the 3rd step. I did the concatenation as well with Mega, but I am going to try this once I go through the tools you have suggested. I am going through more papers. I hope to make a basic tree first followed by dated one. I will get back with more questions soon. Thank you.

ADD REPLY

Login before adding your answer.

Traffic: 2728 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6