Question: Multiple Sequence Alignment for tree building Advice
gravatar for flowerchild4
5.2 years ago by
United States
flowerchild40 wrote:

Hi All, 

Background:  I am trying to build a tree of sequences (non-coding DNA) I obtained in my lab, along with appropriate similarly related known sequences and outgroups. I am using MEGA-- I understand the process of inputting sequence files, and basic concepts of using Clustal W (or Muscle) to align the DNA nucleotides. I also know downstream, I will be using MEGA for determining the ML model of best fit,and creating a ML tree with bootstrapping. 

 I also am new to this so correct me if wrong -- but after inputting sequence files in MEGA, I need to either 1) align by Clustal W/Muscle and then manually align; or 2) submit the unaligned file to a program that will check it for quality of sequences/columns and then use the edited version for tree building. 

My question: Once my sequences are in MEGA, what is the best way to manually align ( after Clustal W) /ensure for quality sequences and regions? I understand this must be done before moving forward in the tree process. I've read Guidance is good for this, but after trying to use it I couldn't download the files in an appropriate file form. Any suggestions on appropriate programs (free if possible) would be appreciated!!! Thank you!

alignment • 2.5k views
ADD COMMENTlink modified 5.2 years ago by Antonio R. Franco4.5k • written 5.2 years ago by flowerchild40
gravatar for h.mon
5.2 years ago by
h.mon30k wrote:

First, Muscle is faster and more accurate than ClustalW, so I would choose Muscle for alignment.

The degree of care / manually editing you should take after automated alignment will depend on how conserved (and how taxonomically broad) is your dataset. For conserved sequences, a quick check and error correction on an alignment viewer - the one provided by Mega is just fine for this- will be enough. If your sequences diverged a lot, then you need to be more careful, generally excluding from the alignment regions of dubious alignment quality. Gblocks is commonly used for this, but I prefer TrimAl. I never used Guidance.

If your non-coding DNA has secondary structure and you know it, you should incorporate the structure into the alignment steps, both automated (Mafft can do this) and manual.

ADD COMMENTlink written 5.2 years ago by h.mon30k
gravatar for Antonio R. Franco
5.2 years ago by
Spain. Universidad de Córdoba
Antonio R. Franco4.5k wrote:

Don't discard the almost new Clustal Omega, that uses the HHalign algorithm  (Hidden-Markov model) to do multiple alignment. I had many succesfull experiences with this program

You can learn more of this program HERE

ADD COMMENTlink modified 5.2 years ago • written 5.2 years ago by Antonio R. Franco4.5k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 740 users visited in the last hour