Workflow for making a phylogram
0
0
Entering edit mode
7.8 years ago

Dear Community,

Can someone maybe have a look on my workflow, for generating a phylogram?

I would like to make a phylogram with 250 unknown nucleotide sequences (300 bp). I retrieved the sequences with one primerset from environmental samples and microbial enrichment cultures. I also collected 25 reference sequences from well descried microbial pure cultures deposited at NCBI.

First, I oriented all the 275 sequences in the same direction, translated them into amino acid code and aligned them using muscle with MEGA7. I trimmed the primers and made sure that all the sequences are in frame. In 3 cases I observed single basepair deletions. To make the codons in frame again, I manually added single nucleotide gaps at the respective sites (besides these 3 nucleotide deletions, there were no more insertions or deletions in the 275 sequences). I translated the amino acid sequences back to nucleotides and applied the model-test method with MEGA7. It suggested the general-time-reversible GTR G+I substitution model. Now I applied the Maximum likelihood method to my alignment. As phylogeny test I used the bootstrap method with 500 replications. Further options were: GTR Model, gamma distributed with Invariant sites (G+I); 5 discrete gamma categories (no clue what this means), gaps were treated as complete deletion and the ML heuristic method NNI (nearest-neighbor-interchange) was used.

This tree looks more or less similar to (reliable?) trees that were published. If blasting the unknown sequences, the results were similar to the nearest reference sequence in the tree. I also tested different nucleotide substitution models and the neighbor joining method and parsimony methods, but all with less success. I also tried Bayesian MCMC (GTR G+I, 2 independent simultaneous runs, relburnin=yes, burninfrac=0.25, the rest was default) with MrBayes, however I struggled to get convergence (only 0.25 instead of < 0.1) and the trees looked odd.

With a tree, I want to show which organism (reference) is the closest neighbor the my unknown sequences. So that I get a phylogenetic affiliation that holds some more information than a simple BLAST-result with %-nucleotide identity.

Do you have any suggestions to this workflow? What would you do, to estimate the reliability of a tree?

Thank you very much in advance for you help,

Martin

alignment sequence MEGA phylogram tree • 2.0k views
ADD COMMENT

Login before adding your answer.

Traffic: 1922 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6