Question: Blast2 Like Local Aligner
1
gravatar for PoGibas
6.9 years ago by
PoGibas4.7k
Vilnius
PoGibas4.7k wrote:

I have lots of ~250bp sequences & need to do local alignment. Tried EMBOSS water & matcher - they are good, but give just the best score. And I need something similar to BLAST2 where all the possible alignments are given. Hope someone could help me.

local alignment • 1.8k views
ADD COMMENTlink written 6.9 years ago by PoGibas4.7k

why not blast2 or megablast then? Too slow? where did you get the sequences from, how many sequences (reads?)?, what is the reference? It is important to know how many sequences there are because that determines the tradeoff between sensitivity and run-time. Out of the blue, try ssearch36 (in fast utils), then if that is too slow try something else, e.g fasta, megablast, blat.

ADD REPLYlink written 6.9 years ago by Michael Dondrup45k

look here: http://fasta.bioch.virginia.edu/fasta_www2/fasta_list2.shtml

ADD REPLYlink written 6.9 years ago by Michael Dondrup45k

~200 sequences and it's not the end. The main idea is to align all the sequences against each other (200*200). And I really want to make all those alignments as automatic as possible.

ADD REPLYlink written 6.9 years ago by PoGibas4.7k
2
gravatar for Michael Dondrup
6.9 years ago by
Bergen, Norway
Michael Dondrup45k wrote:

If water which performs full smith-waterman is efficient enough to map your number of sequences try ssearch which does exactly this. It comes with the fasta tools, has SSE support and multiple thread support such that it could be even faster. It will give you all there is to find (up to an evalue of 10 by default) which will also let you see a lot of bad alignments.

ADD COMMENTlink written 6.9 years ago by Michael Dondrup45k
1
gravatar for Bill Pearson
6.9 years ago by
Bill Pearson830
Bill Pearson830 wrote:

lalign36 does exactly what you want. It will show you all the non-overlapping alignments of a pair of sequences.

ssearch36 does something slightly different (and perhaps more like blast2seq); it will show you all of the parts of the target sequence that align with the query, but once part of the target is aligned, it will not be aligned again.

For example, in the sequence X A A B B Y vs Z A B A B V, both lalign36 and ssearch36 should show:

x a A B b y          (here capital letters indicate actual alignment,
  z A B a b v         lower case indicate context and are not aligned)

  x a A B b y
z a b A B v

But lalign36 would also show:

x A a b b y
z A b a b v

x a a b B y
z a b a B v

x a a b B y
    z a B a b v

    x A a b b y
z a b A b v

ADD COMMENTlink written 6.9 years ago by Bill Pearson830

Have you tested this? Because the documentation states the contrary: "lalign36 - Calculate multiple, non-intersecting alignments using the sim2 implementation of the Waterman-Eggert algorithm [21] developed by Xiaoqui Huang and Web Miller [7]. Statistical estimates are calculated from Smith-Waterman scores of shuffled sequences." This seems to contradict your example.

ADD REPLYlink written 6.9 years ago by Michael Dondrup45k

This information is partially misleading. While it is true that only non-overlapping local alignments are generated by smith-waterman, this is not 'better' with lalign (as documented). In fact I believe this outcome is dictated by the smith waterman algorithm itself, isn't it>

ADD REPLYlink written 6.9 years ago by Michael Dondrup45k

I'm not sure why you think the Waterman-Eggert sim strategy contradicts my example. I'm also not sure why you think I said lalign (Waterman-Eggert) is "better"; it is more exhaustive.

ADD REPLYlink written 6.9 years ago by Bill Pearson830

actually I wanted to test a simple example, while this worked fine for ssearch and fasta i didn't get any alingment with lsearch. I used a target sequence containing a repeat: xyzabbaabbaxyz and query: abba. If i understand correctly that should yield 2 loc.-alignments (equal score). with fasta, ssearch and glsearch I got both 2, but with lalign I got 0 alignments. That might be a bug in lalign, but still there would be no difference, so your first point about ssearch36 is void, at best there is no difference. I don't understand how an algorithm can be more 'exhaustive' than exact one.

ADD REPLYlink written 6.9 years ago by Michael Dondrup45k

I just noted you are the first author of the FASTA tools, if that is the case pls excuse my ignorance! I still don't get two things though: 1. how come lalign36 (36.3.5c Dec, 2011(preload8)) shows 0 hits? 2. if i get your example correctly, lsearch is supposed to yield less-than optimal scoring alingments for a segment? If so, I don't get how this is relevant for the application of the original post?

ADD REPLYlink written 6.9 years ago by Michael Dondrup45k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2404 users visited in the last hour