Question

Multiple Sequence Alignment for tree building Advice

0

Entering edit mode

10.1 years ago

flowerchild4 • 0

Hi All,

Background: I am trying to build a tree of sequences (non-coding DNA) I obtained in my lab, along with appropriate similarly related known sequences and outgroups. I am using MEGA-- I understand the process of inputting sequence files, and basic concepts of using Clustal W (or Muscle) to align the DNA nucleotides. I also know downstream, I will be using MEGA for determining the ML model of best fit,and creating a ML tree with bootstrapping.

I also am new to this so correct me if wrong -- but after inputting sequence files in MEGA, I need to either 1) align by Clustal W/Muscle and then manually align; or 2) submit the unaligned file to a program that will check it for quality of sequences/columns and then use the edited version for tree building.

My question: Once my sequences are in MEGA, what is the best way to manually align (after ClustalW)/ensure for quality sequences and regions? I understand this must be done before moving forward in the tree process. I've read Guidance is good for this, but after trying to use it I couldn't download the files in an appropriate file form. Any suggestions on appropriate programs (free if possible) would be appreciated! Thank you!

alignment • 4.0k views

ADD COMMENT • link updated 2.5 years ago by Ram 45k • written 10.1 years ago by flowerchild4 • 0

Ram · Answer 1 · 2015-05-26

First, Muscle is faster and more accurate than ClustalW, so I would choose Muscle for alignment.

The degree of care / manually editing you should take after automated alignment will depend on how conserved (and how taxonomically broad) is your dataset. For conserved sequences, a quick check and error correction on an alignment viewer - the one provided by Mega is just fine for this- will be enough. If your sequences diverged a lot, then you need to be more careful, generally excluding from the alignment regions of dubious alignment quality. Gblocks is commonly used for this, but I prefer TrimAl. I never used Guidance.

If your non-coding DNA has secondary structure and you know it, you should incorporate the structure into the alignment steps, both automated (Mafft can do this) and manual.

Ram · Answer 2 · 2015-05-26

0

Entering edit mode

10.1 years ago

Antonio R. Franco ★ 5.2k

Don't discard the almost new Clustal Omega, that uses the HHalign algorithm (Hidden-Markov model) to do multiple alignment. I had many successful experiences with this program

You can learn more of this program HERE.

ADD COMMENT • link updated 2.5 years ago by Ram 45k • written 10.1 years ago by Antonio R. Franco ★ 5.2k