Question: Multiple Sequence Alignment For Cdnas
12
gravatar for Haibao Tang
4.4 years ago by
Haibao Tang2.8k
Richmond, CA
Haibao Tang2.8k wrote:

Now I have a collection of sequences that are derived from cDNAs, but most of them are partial. I can start aligning them, but I'd prefer to use the protein sequences as the guide of alignments (so that indels are multiples of three).

With scripting, perhaps I can predict the peptide from cDNA, align the peptides using CLUSTALW, and convert the peptide alignment back to the DNA alignment - but that's a bit of work.

Is there any alignment program that is aware of codons?

ADD COMMENTlink modified 8 months ago by Biostar ♦♦ 0 • written 4.4 years ago by Haibao Tang2.8k
10
gravatar for Rm
4.4 years ago by
Rm6.8k
Danville, PA
Rm6.8k wrote:

Some useful programs:

Pairwise Align Codons accepts two coding sequences and determines the optimal global alignment.

Create a codon alignment based on an existing protein multiple sequence alignment using PAL2NAL or RevTrans.

SQUINT: a multiple alignment program and editor

SNAP calculates synonymous and non-synonymous substitution rates based on a set of codon-aligned nucleotide sequences.

ADD COMMENTlink written 4.4 years ago by Rm6.8k

I would add that in general, don't use ClustalW to do your protein alignments. There are so many better, fast options out there. As soon as you start getting any amount of divergence in your sequences ClustalW quickly starts giving you incorrect alignments, grossly underestimating the number of indels. Muscle, Mafft, Prank, FSA, ProbCons are all great alignment programs that do a much better job. HMMER3 if you hapen to have a seed alignment or are aligning a single domain protein.

ADD REPLYlink written 3.6 years ago by Dan Gaston3.8k
6
gravatar for Dave Lunt
4.4 years ago by
Dave Lunt1.8k
Hull, UK
Dave Lunt1.8k wrote:

I like this perl program, it does the hard work for you unlike some other scripts, ie you don't have to provide amino acid alignment files. It also does some basic checking for non-valid protein coding sequences (frameshifts etc).

transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences

Olaf RP Bininda-Emonds

BMC Bioinformatics 2005, 6:156 doi:10.1186/1471-2105-6-156

http://www.biomedcentral.com/1471-2105/6/156

http://www.molekularesystematik.uni-oldenburg.de/33997.html#Sequences

ADD COMMENTlink written 4.4 years ago by Dave Lunt1.8k
4
gravatar for 2184687-1231-83-
4.4 years ago by
2184687-1231-83-4.6k wrote:

PRANK has a codon model to align CDS sequences: http://www.ebi.ac.uk/goldman-srv/prank/

ADD COMMENTlink written 4.4 years ago by 2184687-1231-83-4.6k

They moved it here: http://code.google.com/p/prank-msa/

ADD REPLYlink written 9 months ago by Biojl1.2k
1
gravatar for Darked89
4.4 years ago by
Darked893.6k
Barcelona, Spain
Darked893.6k wrote:

I do not know what exactly is your input cDNA set (same species, closely related species, some crazy variable tunicate, different species) but as a general rule, if possible, I would rather stick to aligning same/close species on nucleotide level. Otherwise you may run into assembling several paralogues into one artifact protein.

If you really must, maybe try to cluster the protein sequences first with CD-Hit or uclust. Even not trying hard one can align any protein sequences in clustalw, so some prescreening should take care of it.

Finally some in principle genome aligners are good at taking DNA input and aligning mostly coding sequences. Check i.e. last http://last.cbrc.jp/

ADD COMMENTlink written 4.4 years ago by Darked893.6k
0
gravatar for Paul_Muller
3.6 years ago by
Paul_Muller60
Northeastern University
Paul_Muller60 wrote:

I once had to make a large multiple alignment from EST data for 1 genes across 20 or so plant species, all of which had high copy number and divergence thanks to paleopolyploidization, and I decided the best way to do this was a codon level alignment. Since this was a one-time thing and not part of an analysis pipeline I wanted to maintain the ability to manually edit the alignment as well. I used MEGA 4.0. You can open all your cDNA sequences, toggle them to amino acids, align the aa using built-in Clustal functionality, then toggle back to nucleotides maintaining the aa alignment (which are now codons). Another handy feature was that you can choose to align select regions simply by highlighting the columns you want to align. This allows you to manually edit pesky regions while maintaining the bigger picture.

Hope this helps.

ADD COMMENTlink written 3.6 years ago by Paul_Muller60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1002 users visited in the last hour