Question

Multiple Sequence Alignment For Cdnas

13

Entering edit mode

13.6 years ago

Haibao Tang 3.0k

Now I have a collection of sequences that are derived from cDNAs, but most of them are partial. I can start aligning them, but I'd prefer to use the protein sequences as the guide of alignments (so that indels are multiples of three).

With scripting, perhaps I can predict the peptide from cDNA, align the peptides using CLUSTALW, and convert the peptide alignment back to the DNA alignment - but that's a bit of work.

Is there any alignment program that is aware of codons?

cdna alignment multiple clustalw • 12k views

ADD COMMENT • link updated 8.0 years ago by kissaj ▴ 110 • written 13.6 years ago by Haibao Tang 3.0k

score 10 · Answer 1 · 2010-09-24

10

Entering edit mode

13.6 years ago

Rm 8.3k

Some useful programs:

Pairwise Align Codons accepts two coding sequences and determines the optimal global alignment.

Create a codon alignment based on an existing protein multiple sequence alignment using PAL2NAL or RevTrans.

SQUINT: a multiple alignment program and editor

SNAP calculates synonymous and non-synonymous substitution rates based on a set of codon-aligned nucleotide sequences.

ADD COMMENT • link 13.6 years ago by Rm 8.3k

0

Entering edit mode

I would add that in general, don't use ClustalW to do your protein alignments. There are so many better, fast options out there. As soon as you start getting any amount of divergence in your sequences ClustalW quickly starts giving you incorrect alignments, grossly underestimating the number of indels. Muscle, Mafft, Prank, FSA, ProbCons are all great alignment programs that do a much better job. HMMER3 if you hapen to have a seed alignment or are aligning a single domain protein.

ADD REPLY • link 12.8 years ago by DG 7.3k

Ram · Answer 2 · 2010-09-24

I like this perl program, it does the hard work for you unlike some other scripts, ie you don't have to provide amino acid alignment files. It also does some basic checking for non-valid protein coding sequences (frameshifts etc).

transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences

Olaf RP Bininda-Emonds

BMC Bioinformatics 2005, 6:156 doi:10.1186/1471-2105-6-156

http://www.molekularesystematik.uni-oldenburg.de/33997.html#Sequences

Ram · Answer 3 · 2010-09-24

6

Entering edit mode

13.6 years ago

2184687-1231-83- ★ 5.1k

PRANK has a codon model to align CDS sequences.

ADD COMMENT • link updated 4.6 years ago by Ram 43k • written 13.6 years ago by 2184687-1231-83- ★ 5.1k

1

Entering edit mode

They moved it here: http://code.google.com/p/prank-msa/

ADD REPLY • link 9.9 years ago by Biojl ★ 1.7k

score 2 · Answer 4 · 2016-05-06

Hello,

I align everything by amino acid first, using Clustal Omega (better than any other alignment program for AA;https://dx.doi.org/10.1038%2Fmsb.2011.75) (NB This is NOT ClustalW, nor ClustalX). Then I use CodonAlign 2.0 (http://wbiomed.curtin.edu.au/bioinf/CodonAlign.php) and the nucleotide sequences from which the protein was generated to align the nt to the aa alignment.

You could also use MEGA. Import the nt, translate, align, reverse translate.

Andor

score 1 · Answer 5 · 2010-09-27

I do not know what exactly is your input cDNA set (same species, closely related species, some crazy variable tunicate, different species) but as a general rule, if possible, I would rather stick to aligning same/close species on nucleotide level. Otherwise you may run into assembling several paralogues into one artifact protein.

If you really must, maybe try to cluster the protein sequences first with CD-Hit or uclust. Even not trying hard one can align any protein sequences in clustalw, so some prescreening should take care of it.

Finally some in principle genome aligners are good at taking DNA input and aligning mostly coding sequences. Check i.e. last http://last.cbrc.jp/

score 0 · Answer 6 · 2011-07-19

I once had to make a large multiple alignment from EST data for 1 genes across 20 or so plant species, all of which had high copy number and divergence thanks to paleopolyploidization, and I decided the best way to do this was a codon level alignment. Since this was a one-time thing and not part of an analysis pipeline I wanted to maintain the ability to manually edit the alignment as well. I used MEGA 4.0. You can open all your cDNA sequences, toggle them to amino acids, align the aa using built-in Clustal functionality, then toggle back to nucleotides maintaining the aa alignment (which are now codons). Another handy feature was that you can choose to align select regions simply by highlighting the columns you want to align. This allows you to manually edit pesky regions while maintaining the bigger picture.

Hope this helps.