8.2 years ago by
University of Nebraska
I see this question as an extension of the question you just asked and it sounds like you are on the right track.
One thing I would comment on regarding your order of operations above, I would never compute a phylogenetic tree until after you have an alignment, and in my opinion, a curated alignment. I also think, during curation, that every alignment should be edited by eye.
Alignment programs are far from perfect (so are people when doing alignments, computers are a lot faster, but are also error prone). I see a lot of pipeline algorithms designed to go directly from BLAST, to CLUSTAL W or MUSCLE, then automatic trimming of sequences (what you are calling "chop the region of interest", correct?), and then go right into constructing a phylogeny without looking at the alignment. I think this is a recipe for danger.
With what I said above, I think you really need to grasp the diversity in your homologous domain in order to know how to inspect the alignment and choose which sequences show synteny within that alignment. Some of your BLAST hits to your domain of interest could be due to chance (have you decided on a e-value cut off?) or to convergent evolution. Also, you study fungi, like I do, and we are becoming more and more aware of how prevalent horizontal gene transfer events are in the fungi. The process of understanding homology is subtle and time consuming. If you need to read up on homology and phylogenetic theory, I may suggest the new book by Baum & Smith: Tree Thinking.
My standards are: Determine if the sequences are truly homologous. My answer here is long because this is not a trivial exercise.
My strategy to address the problem is to first take my domain of interest and BLAST it to my database of interest (are you just interested in the fungi or in all organisms?). I then align the sequences using one of many programs that I choose (for me this may mean all hits to all sequenced fungal genomes, or all the sequences in NCBI, or a single genus). I edit my alignment in a text editor to check for obvious homologies, mis-placed aligned regions, etc., etc. I construct a phylogenetic tree in lots of different ways and using lots of different methods and programs (see Joe Felsenstein's list here and my recent opinion here) according to how much time I have and the stage of the project. I look at the tree, maybe do another BLAST, add sequences, remove sequences, look for psuedogenes, clades showing long branch attraction, etc., and continue to the process. A phylogeny is never perfect, it's an estimate and a hypothesis, but some phylogenies are better than others.
modified 8.2 years ago
8.2 years ago by
Josh Herr ♦ 5.7k