Question: Multiple Sequence Alignment For Cdnas
gravatar for Haibao Tang
10.3 years ago by
Haibao Tang3.0k
Mountain View, CA
Haibao Tang3.0k wrote:

Now I have a collection of sequences that are derived from cDNAs, but most of them are partial. I can start aligning them, but I'd prefer to use the protein sequences as the guide of alignments (so that indels are multiples of three).

With scripting, perhaps I can predict the peptide from cDNA, align the peptides using CLUSTALW, and convert the peptide alignment back to the DNA alignment - but that's a bit of work.

Is there any alignment program that is aware of codons?

alignment cdna clustalw multiple • 9.3k views
ADD COMMENTlink modified 4.7 years ago by kissaj100 • written 10.3 years ago by Haibao Tang3.0k
gravatar for Rm
10.3 years ago by
Danville, PA
Rm8.0k wrote:

Some useful programs:

Pairwise Align Codons accepts two coding sequences and determines the optimal global alignment.

Create a codon alignment based on an existing protein multiple sequence alignment using PAL2NAL or RevTrans.

SQUINT: a multiple alignment program and editor

SNAP calculates synonymous and non-synonymous substitution rates based on a set of codon-aligned nucleotide sequences.

ADD COMMENTlink written 10.3 years ago by Rm8.0k

I would add that in general, don't use ClustalW to do your protein alignments. There are so many better, fast options out there. As soon as you start getting any amount of divergence in your sequences ClustalW quickly starts giving you incorrect alignments, grossly underestimating the number of indels. Muscle, Mafft, Prank, FSA, ProbCons are all great alignment programs that do a much better job. HMMER3 if you hapen to have a seed alignment or are aligning a single domain protein.

ADD REPLYlink written 9.5 years ago by DG7.2k
gravatar for Dave Lunt
10.3 years ago by
Dave Lunt2.0k
Hull, UK
Dave Lunt2.0k wrote:

I like this perl program, it does the hard work for you unlike some other scripts, ie you don't have to provide amino acid alignment files. It also does some basic checking for non-valid protein coding sequences (frameshifts etc).

transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences

Olaf RP Bininda-Emonds

BMC Bioinformatics 2005, 6:156 doi:10.1186/1471-2105-6-156

ADD COMMENTlink modified 16 months ago by _r_am32k • written 10.3 years ago by Dave Lunt2.0k
gravatar for 2184687-1231-83-
10.3 years ago by
2184687-1231-83-5.0k wrote:

PRANK has a codon model to align CDS sequences.

ADD COMMENTlink modified 16 months ago by _r_am32k • written 10.3 years ago by 2184687-1231-83-5.0k

They moved it here:

ADD REPLYlink written 6.7 years ago by Biojl1.7k
gravatar for kissaj
4.7 years ago by
United States
kissaj100 wrote:


I align everything by amino acid first, using Clustal Omega (better than any other alignment program for AA; (NB This is NOT ClustalW, nor ClustalX). Then I use CodonAlign 2.0 ( and the nucleotide sequences from which the protein was generated to align the nt to the aa alignment.

You could also use MEGA. Import the nt, translate, align, reverse translate.

  • Andor
ADD COMMENTlink written 4.7 years ago by kissaj100
gravatar for Darked89
10.3 years ago by
Barcelona, Spain
Darked894.2k wrote:

I do not know what exactly is your input cDNA set (same species, closely related species, some crazy variable tunicate, different species) but as a general rule, if possible, I would rather stick to aligning same/close species on nucleotide level. Otherwise you may run into assembling several paralogues into one artifact protein.

If you really must, maybe try to cluster the protein sequences first with CD-Hit or uclust. Even not trying hard one can align any protein sequences in clustalw, so some prescreening should take care of it.

Finally some in principle genome aligners are good at taking DNA input and aligning mostly coding sequences. Check i.e. last

ADD COMMENTlink written 10.3 years ago by Darked894.2k
gravatar for Paul_Muller
9.5 years ago by
Northeastern University
Paul_Muller70 wrote:

I once had to make a large multiple alignment from EST data for 1 genes across 20 or so plant species, all of which had high copy number and divergence thanks to paleopolyploidization, and I decided the best way to do this was a codon level alignment. Since this was a one-time thing and not part of an analysis pipeline I wanted to maintain the ability to manually edit the alignment as well. I used MEGA 4.0. You can open all your cDNA sequences, toggle them to amino acids, align the aa using built-in Clustal functionality, then toggle back to nucleotides maintaining the aa alignment (which are now codons). Another handy feature was that you can choose to align select regions simply by highlighting the columns you want to align. This allows you to manually edit pesky regions while maintaining the bigger picture.

Hope this helps.

ADD COMMENTlink written 9.5 years ago by Paul_Muller70
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2596 users visited in the last hour