Gblocks For 16Srna Phylogenetics ?
3
3
Entering edit mode
12.2 years ago
Brett ▴ 150

I am pretty new to bioinformatics and have been trying to perform phylogenetics upon approx 100 divergent bacteria.

The approach that I took was using MUSCLE alignment then simply a maximum likelihood tree, which got most things spot on but there were a few obvious errors.

A friend told me about gblocks, and after running my data sure enough I got a better tree.

However reading the literature (see italics below) it appears gblocks is only intended for protein coding DNA.

Although we have only used protein alignments, the same conclusions are expected to apply to protein-coding DNA alignments of similar divergence. On the other hand, although we predict that the general conclusion that ambiguously aligned regions in any data set are best excluded when they provide more noise than signal, rRNA alignments as well as alignments from noncoding DNA have very different features from coding alignments, and our simulations were not specifically designed to explore the properties of these kinds of sequences. However, our purpose in this work is not giving strict rules about the best alignment strategy and associated parameters.

Essentially what I am asking, is curating the data with gblocks valid (with 16s Rna) ? Has any one used it in literature before (that I cant find) ? or am I going to have to find a new way to improve my data ?

All opinions appreciated

phylogenetics • 3.8k views
ADD COMMENT
2
Entering edit mode
12.2 years ago
Cliff Beall ▴ 470

I would try to improve the alignment, rather than exclude regions. My take on Gblocks is that it's designed to deal with more divergent sequences than 16S.

We have come up with the following workflow that seems to work well. It uses a secondary structure-aware method that seems to produce fewer obvious misalignments than Clustal.

  • Download a curated 16S alignment from RDP and use it with cmbuild from the Infernal suite to generate a covariance model.
  • Align the sequences of interest using the covariance model with cmalign.
  • Use BioPython AlignIO to convert Stockholm to Phylip format
  • Inspect, trim and if necessary adjust the alignment in Mesquite. The main areas Infernal seems to have problems is when the end of a sequence comes near a gap in the overall alignment.
  • Generate a tree using RAxML or other

RDP:http://rdp.cme.msu.edu/misc/resources.jsp Infernal:http://infernal.janelia.org/ Mesquite:http://mesquiteproject.org/mesquite/mesquite.html

ADD COMMENT
1
Entering edit mode
ADD COMMENT
0
Entering edit mode
12.2 years ago
Joseph Hughes ★ 3.0k

For rRNA you might want to look into profile alignment options in MAFFT or CLUSTAL. You will need a reliable profile to align to. Greengenes Core Set might be useful. You might also find NAST multiple alignment useful.

ADD COMMENT

Login before adding your answer.

Traffic: 2036 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6