Question: Dna Multiple Alignment With Zero End-Gap Penalty
5
gravatar for Lyco
7.7 years ago by
Lyco2.3k
Germany
Lyco2.3k wrote:

Again a case where I feel unable to help a colleague of mine, but I am sure that somebody here has an easy answer.

What we need is a program for multiple-alignment of DNA sequences. I am a protein person, so I had a look at Muscle and Mafft, which both can handle DNA sequences as well. However, this did not work well, since I don't see a way of tweaking the match/gap parameters the way we need them.

There are several alignment scenarios that need to be covered, one of them being DNA sequences with a relatively modest overlap. In this case, the alignment programs (treating endgaps like normal gaps) decided to find some non-existent 'similarity' between the DNAs and aligned them this way rather than providing the correct alignment with enormous end-gaps. It din't even help to add a reference sequence to the alignment (note that the reference is not necessarily from the same species, so there are some mismatches but clearly enough similarity to guide the alignment process)

As this problem resembles the 'assembly problem' common to the sequencing community (of which I am not a member). Thus, I had a look at things like phred/phrap (which is much too expensive for us) or bwa (which uses lots of funny terminology like 'color space', which is beyond my horizon). Moreover, our sequences are not exactly genome-size but typically 1-10 kB pieces of genomic and mRNA sequence. Moreover, the 'assembly-type' software does not return anything that looks like a multiple alignment.

Can anybody recommend a (free) software that either does conventional DNA multiple alignments but allows to set the endgap penalty to zero and allows very cheap gap-extensions for accomodating splicing? Alternatively, is there a free or cheap DNA assembly software that can produce multiple alignment files in a standard MSA format (fasta, MSF, whatever) ?

assembly multiple dna • 3.8k views
ADD COMMENTlink written 7.7 years ago by Lyco2.3k

what is the biggest contig that comes out of the assembly?

ADD REPLYlink written 7.7 years ago by Jeremy Leipzig18k

It is not really an assembly problem, I just guessed that it can be treated similar. In a typical situation, a genomic sequence of 2-10 kB is aligned to a number of genomic and cDNA sequences, either coming from the same or a closely related species. It would already be great if I had a solution for aligning one gene (exons, introns, everything) to e.g. 10 different cDNA fragments with a few mutations.

ADD REPLYlink written 7.7 years ago by Lyco2.3k
2
gravatar for Andreas
7.7 years ago by
Andreas2.4k
Singapore
Andreas2.4k wrote:

Hi Lyco,

you could try Mafft's LINSI (executable is linsi or mafft-linsi) which 'allows large terminal gaps' (see PMID 18372315)

Andreas

ADD COMMENTlink written 7.7 years ago by Andreas2.4k

Actually, LINSI is what we are using at the moment. It works in some cases but sometimes it messes up big time. I didn't find a way of specifically influencing endgaps, do you know one?

ADD REPLYlink written 7.7 years ago by Lyco2.3k

Really? That's surprising. Are the sequences very divergent? If I remember correctly than ClustalW does not penalize terminal gaps by default. If you don't have too many sequences you could try T-Coffee as suggested by qdjm.

ADD REPLYlink written 7.7 years ago by Andreas2.4k
2
gravatar for Qdjm
7.7 years ago by
Qdjm1.9k
Toronto
Qdjm1.9k wrote:

Hi Lyco,

Sounds like you want an alignment tool that does local alignment rather than global alignment.

Have you looked at T-Coffee? I've never used it but it looks like it has the command line arguments that you need: Zero terminal gap penalty, Gap extension penalty, Gap open penalty

So, for example if you wanted a gap open penalty of X and an extension penalty of Y (where X and Y are negative values), you would use:

-tg_mode=2 -gapopen=X -gapext=Y
ADD COMMENTlink written 7.7 years ago by Qdjm1.9k

Thank you for the idea. I abandoned T-coffee several years ago for Muscle and Mafft because they are better and much faster for my usual tasks. This particular application might be one reason to revisit t-coffee.

ADD REPLYlink written 7.7 years ago by Lyco2.3k

According to the Wikipedia page on sequence alignment tools, MAFFT does do local alignment (i.e. no terminal gap penalties). Did you try the --lop --lexp and --localpair command line options?

ADD REPLYlink written 7.7 years ago by Qdjm1.9k
1
gravatar for Graslevy
7.7 years ago by
Graslevy240
UK
Graslevy240 wrote:

Hey Lyco, The genomics workbench from clc (http://www.clcbio.com/index.php?id=1240) is perfect for what you want to achieve. Its not free though but you can try free for 30 days. Also try genious pro...

ADD COMMENTlink written 7.7 years ago by Graslevy240
1

Thanks for the suggestions. I hope that someone will come up with a free solution. Moreover, both clc and geneious look GUIy, and a GUI is the last thing I need for this task. We are more into automating things in batch mode

ADD REPLYlink written 7.7 years ago by Lyco2.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 670 users visited in the last hour