Which Is The Most Accurate Method To Align Multiple Nucleotide Sequences Without Prior Information?
3
4
Entering edit mode
14.1 years ago
Michael Barton ★ 1.9k

What tool/method do you recommend to align multiple nucleotide sequences? It seems like there are many different options for aligning sequences as new tools are released with different refinements to the alignment algorithm. Assuming there is no prior information such as the phylogeny or a HMM profile which tool/method will to produce the most accurate nucleotide sequence alignment?

alignment sequence • 3.9k views
ADD COMMENT
0
Entering edit mode

DO you mean that only similarity is allowed in the search method? No evolutionary information extraction permited?

ADD REPLY
0
Entering edit mode

Aligning sequences you know are homologs but you don't really have any other information

ADD REPLY
8
Entering edit mode
14.1 years ago
Andreas ★ 2.5k

That depends on the type of nucleic acid sequence:

If you are talking about structural / non-coding RNA, you should use a program, that takes structure into account. Have a look at CentroidAlign (Hamada et al., 2009), Mafft Xinsi (Katoh & Toh, 2008), R-Coffee (Wilm et al., 2008) and benchmarks & references therein. It also depends how many sequences you want to align (more than 50 will be tricky) and how long they are (ribosomal RNAs are a challenge).

It's hard to test alignment programs on DNA sequences. If you know these sequences code for protein, you are better of translating them into protein first and align them afterwards.

As a rule of thumb for all type of sequences: consistency-based methods are very good. That includes T-Coffee (Notredame et al., 2000) and its variations, ProbCons (Do et al., 2005) and also many of the MAFFT implementations (Katoh et al., 2005). Muscle (Edgar, 2004) is also generally very good.

ADD COMMENT
0
Entering edit mode

Thank you. This is a nice review of the possible options of sequence alignment.

ADD REPLY
4
Entering edit mode
14.1 years ago
Paulo Nuin ★ 3.7k

I cannot vouch for any alignment package regarding DNA, but in my 2006 paper, mafft and ProbCons were the most accurate ones, with muscle a little bit behind. We tested more than 30000 multiple sequence alignments.

Last year I saw a presentation by Nick Goldman from the EBI, where he showed tests on his package, Prank, that is capable to use phylogenetic information in the alignment. According to his tests he had a very good accuracy, but I haven't seen any third-party test on his package.

ADD COMMENT
0
Entering edit mode

Thanks Paulo. It's good to know which ones perform best.

ADD REPLY
0
Entering edit mode

I still plan to simulate DNA and test the programs again. If I had time ...

ADD REPLY
1
Entering edit mode
14.0 years ago
Yannick Wurm ★ 2.5k

Use protein information if you can

Here's a nice little review with advice on how to do things: http://genomebiology.com/2010/11/4/R37

ADD COMMENT
0
Entering edit mode

And when the DNA sequence is not translated, as some of Andreas' examples?

ADD REPLY
0
Entering edit mode

use protein information if you can -> why? Can you elaborate a bit?

ADD REPLY
0
Entering edit mode

I think protein information is useful because their is less ambiguity about aligning each base. For DNA incorrect alignments can prediction inserstions/deletions which result in frameshifts.

ADD REPLY

Login before adding your answer.

Traffic: 2685 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6