Dna Multiple Alignment With Zero End-Gap Penalty
3
5
Entering edit mode
11.7 years ago
Lyco ★ 2.3k

Again a case where I feel unable to help a colleague of mine, but I am sure that somebody here has an easy answer.

What we need is a program for multiple-alignment of DNA sequences. I am a protein person, so I had a look at Muscle and Mafft, which both can handle DNA sequences as well. However, this did not work well, since I don't see a way of tweaking the match/gap parameters the way we need them.

There are several alignment scenarios that need to be covered, one of them being DNA sequences with a relatively modest overlap. In this case, the alignment programs (treating endgaps like normal gaps) decided to find some non-existent 'similarity' between the DNAs and aligned them this way rather than providing the correct alignment with enormous end-gaps. It din't even help to add a reference sequence to the alignment (note that the reference is not necessarily from the same species, so there are some mismatches but clearly enough similarity to guide the alignment process)

As this problem resembles the 'assembly problem' common to the sequencing community (of which I am not a member). Thus, I had a look at things like phred/phrap (which is much too expensive for us) or bwa (which uses lots of funny terminology like 'color space', which is beyond my horizon). Moreover, our sequences are not exactly genome-size but typically 1-10 kB pieces of genomic and mRNA sequence. Moreover, the 'assembly-type' software does not return anything that looks like a multiple alignment.

Can anybody recommend a (free) software that either does conventional DNA multiple alignments but allows to set the endgap penalty to zero and allows very cheap gap-extensions for accomodating splicing? Alternatively, is there a free or cheap DNA assembly software that can produce multiple alignment files in a standard MSA format (fasta, MSF, whatever) ?

dna multiple assembly • 5.2k views
0
Entering edit mode

what is the biggest contig that comes out of the assembly?

0
Entering edit mode

It is not really an assembly problem, I just guessed that it can be treated similar. In a typical situation, a genomic sequence of 2-10 kB is aligned to a number of genomic and cDNA sequences, either coming from the same or a closely related species. It would already be great if I had a solution for aligning one gene (exons, introns, everything) to e.g. 10 different cDNA fragments with a few mutations.

2
Entering edit mode
11.7 years ago
Andreas ★ 2.5k

Hi Lyco,

you could try Mafft's LINSI (executable is linsi or mafft-linsi) which 'allows large terminal gaps' (see PMID 18372315)

Andreas

0
Entering edit mode

Actually, LINSI is what we are using at the moment. It works in some cases but sometimes it messes up big time. I didn't find a way of specifically influencing endgaps, do you know one?

0
Entering edit mode

Really? That's surprising. Are the sequences very divergent? If I remember correctly than ClustalW does not penalize terminal gaps by default. If you don't have too many sequences you could try T-Coffee as suggested by qdjm.

2
Entering edit mode
11.7 years ago
Qdjm 1.9k

Hi Lyco,

Sounds like you want an alignment tool that does local alignment rather than global alignment.

Have you looked at T-Coffee? I've never used it but it looks like it has the command line arguments that you need: Zero terminal gap penalty, Gap extension penalty, Gap open penalty

So, for example if you wanted a gap open penalty of X and an extension penalty of Y (where X and Y are negative values), you would use:

-tg_mode=2 -gapopen=X -gapext=Y

0
Entering edit mode

Thank you for the idea. I abandoned T-coffee several years ago for Muscle and Mafft because they are better and much faster for my usual tasks. This particular application might be one reason to revisit t-coffee.

0
Entering edit mode

According to the Wikipedia page on sequence alignment tools, MAFFT does do local alignment (i.e. no terminal gap penalties). Did you try the --lop --lexp and --localpair command line options?

1
Entering edit mode
11.7 years ago
Graslevy ▴ 240

Hey Lyco, The genomics workbench from clc (http://www.clcbio.com/index.php?id=1240) is perfect for what you want to achieve. Its not free though but you can try free for 30 days. Also try genious pro...

1
Entering edit mode

Thanks for the suggestions. I hope that someone will come up with a free solution. Moreover, both clc and geneious look GUIy, and a GUI is the last thing I need for this task. We are more into automating things in batch mode