Multiple Sequence Alignment For Cdnas
6
13
Entering edit mode
13.6 years ago

Now I have a collection of sequences that are derived from cDNAs, but most of them are partial. I can start aligning them, but I'd prefer to use the protein sequences as the guide of alignments (so that indels are multiples of three).

With scripting, perhaps I can predict the peptide from cDNA, align the peptides using CLUSTALW, and convert the peptide alignment back to the DNA alignment - but that's a bit of work.

Is there any alignment program that is aware of codons?

cdna alignment multiple clustalw • 12k views
ADD COMMENT
10
Entering edit mode
13.6 years ago
Rm 8.3k

Some useful programs:

Pairwise Align Codons accepts two coding sequences and determines the optimal global alignment.

Create a codon alignment based on an existing protein multiple sequence alignment using PAL2NAL or RevTrans.

SQUINT: a multiple alignment program and editor

SNAP calculates synonymous and non-synonymous substitution rates based on a set of codon-aligned nucleotide sequences.

ADD COMMENT
0
Entering edit mode

I would add that in general, don't use ClustalW to do your protein alignments. There are so many better, fast options out there. As soon as you start getting any amount of divergence in your sequences ClustalW quickly starts giving you incorrect alignments, grossly underestimating the number of indels. Muscle, Mafft, Prank, FSA, ProbCons are all great alignment programs that do a much better job. HMMER3 if you hapen to have a seed alignment or are aligning a single domain protein.

ADD REPLY
7
Entering edit mode
13.6 years ago
Dave Lunt ★ 2.0k

I like this perl program, it does the hard work for you unlike some other scripts, ie you don't have to provide amino acid alignment files. It also does some basic checking for non-valid protein coding sequences (frameshifts etc).

transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences

Olaf RP Bininda-Emonds

BMC Bioinformatics 2005, 6:156 doi:10.1186/1471-2105-6-156

http://www.molekularesystematik.uni-oldenburg.de/33997.html#Sequences

ADD COMMENT
6
Entering edit mode
13.6 years ago

PRANK has a codon model to align CDS sequences.

ADD COMMENT
1
Entering edit mode
ADD REPLY
2
Entering edit mode
8.0 years ago
kissaj ▴ 110

Hello,

I align everything by amino acid first, using Clustal Omega (better than any other alignment program for AA;https://dx.doi.org/10.1038%2Fmsb.2011.75) (NB This is NOT ClustalW, nor ClustalX). Then I use CodonAlign 2.0 (http://wbiomed.curtin.edu.au/bioinf/CodonAlign.php) and the nucleotide sequences from which the protein was generated to align the nt to the aa alignment.

You could also use MEGA. Import the nt, translate, align, reverse translate.

  • Andor
ADD COMMENT
1
Entering edit mode
13.6 years ago
Darked89 4.6k

I do not know what exactly is your input cDNA set (same species, closely related species, some crazy variable tunicate, different species) but as a general rule, if possible, I would rather stick to aligning same/close species on nucleotide level. Otherwise you may run into assembling several paralogues into one artifact protein.

If you really must, maybe try to cluster the protein sequences first with CD-Hit or uclust. Even not trying hard one can align any protein sequences in clustalw, so some prescreening should take care of it.

Finally some in principle genome aligners are good at taking DNA input and aligning mostly coding sequences. Check i.e. last http://last.cbrc.jp/

ADD COMMENT
0
Entering edit mode
12.8 years ago
Paul_Muller ▴ 70

I once had to make a large multiple alignment from EST data for 1 genes across 20 or so plant species, all of which had high copy number and divergence thanks to paleopolyploidization, and I decided the best way to do this was a codon level alignment. Since this was a one-time thing and not part of an analysis pipeline I wanted to maintain the ability to manually edit the alignment as well. I used MEGA 4.0. You can open all your cDNA sequences, toggle them to amino acids, align the aa using built-in Clustal functionality, then toggle back to nucleotides maintaining the aa alignment (which are now codons). Another handy feature was that you can choose to align select regions simply by highlighting the columns you want to align. This allows you to manually edit pesky regions while maintaining the bigger picture.

Hope this helps.

ADD COMMENT

Login before adding your answer.

Traffic: 2674 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6