Question

Blast2 Like Local Aligner

1

Entering edit mode

13.3 years ago

PoGibas 5.1k

I have lots of ~250bp sequences & need to do local alignment. Tried EMBOSS water & matcher - they are good, but give just the best score. And I need something similar to BLAST2 where all the possible alignments are given. Hope someone could help me.

local alignment • 3.9k views

ADD COMMENT • link updated 2.9 years ago by Ram 45k • written 13.3 years ago by PoGibas 5.1k

0

Entering edit mode

why not blast2 or megablast then? Too slow? where did you get the sequences from, how many sequences (reads?)?, what is the reference? It is important to know how many sequences there are because that determines the tradeoff between sensitivity and run-time. Out of the blue, try ssearch36 (in fast utils), then if that is too slow try something else, e.g fasta, megablast, blat.

ADD REPLY • link 13.3 years ago by Michael 55k

0

Entering edit mode

look here: http://fasta.bioch.virginia.edu/fasta_www2/fasta_list2.shtml

ADD REPLY • link 13.3 years ago by Michael 55k

0

Entering edit mode

~200 sequences and it's not the end. The main idea is to align all the sequences against each other (200*200). And I really want to make all those alignments as automatic as possible.

ADD REPLY • link 13.3 years ago by PoGibas 5.1k

score 2 · Answer 1 · 2012-03-21

If water which performs full smith-waterman is efficient enough to map your number of sequences try ssearch which does exactly this. It comes with the fasta tools, has SSE support and multiple thread support such that it could be even faster. It will give you all there is to find (up to an evalue of 10 by default) which will also let you see a lot of bad alignments.

score 1 · Answer 2 · 2012-03-21

1

Entering edit mode

13.3 years ago

Bill Pearson ★ 1.1k

lalign36 does exactly what you want. It will show you all the non-overlapping alignments of a pair of sequences.

ssearch36 does something slightly different (and perhaps more like blast2seq); it will show you all of the parts of the target sequence that align with the query, but once part of the target is aligned, it will not be aligned again.

For example, in the sequence X A A B B Y vs Z A B A B V, both lalign36 and ssearch36 should show:

x a A B b y          (here capital letters indicate actual alignment,
  z A B a b v         lower case indicate context and are not aligned)

  x a A B b y
z a b A B v

But lalign36 would also show:

x A a b b y
z A b a b v

x a a b B y
z a b a B v

x a a b B y
    z a B a b v

    x A a b b y
z a b A b v

ADD COMMENT • link 13.3 years ago by Bill Pearson ★ 1.1k

0

Entering edit mode

Have you tested this? Because the documentation states the contrary: "lalign36 - Calculate multiple, non-intersecting alignments using the sim2 implementation of the Waterman-Eggert algorithm [21] developed by Xiaoqui Huang and Web Miller [7]. Statistical estimates are calculated from Smith-Waterman scores of shufﬂed sequences." This seems to contradict your example.

ADD REPLY • link 13.3 years ago by Michael 55k

0

Entering edit mode

This information is partially misleading. While it is true that only non-overlapping local alignments are generated by smith-waterman, this is not 'better' with lalign (as documented). In fact I believe this outcome is dictated by the smith waterman algorithm itself, isn't it>

ADD REPLY • link 13.3 years ago by Michael 55k

0

Entering edit mode

I'm not sure why you think the Waterman-Eggert sim strategy contradicts my example. I'm also not sure why you think I said lalign (Waterman-Eggert) is "better"; it is more exhaustive.

ADD REPLY • link 13.3 years ago by Bill Pearson ★ 1.1k

0

Entering edit mode

actually I wanted to test a simple example, while this worked fine for ssearch and fasta i didn't get any alingment with lsearch. I used a target sequence containing a repeat: xyzabbaabbaxyz and query: abba. If i understand correctly that should yield 2 loc.-alignments (equal score). with fasta, ssearch and glsearch I got both 2, but with lalign I got 0 alignments. That might be a bug in lalign, but still there would be no difference, so your first point about ssearch36 is void, at best there is no difference. I don't understand how an algorithm can be more 'exhaustive' than exact one.

ADD REPLY • link 13.3 years ago by Michael 55k

0

Entering edit mode

I just noted you are the first author of the FASTA tools, if that is the case pls excuse my ignorance! I still don't get two things though: 1. how come lalign36 (36.3.5c Dec, 2011(preload8)) shows 0 hits? 2. if i get your example correctly, lsearch is supposed to yield less-than optimal scoring alingments for a segment? If so, I don't get how this is relevant for the application of the original post?

ADD REPLY • link 13.3 years ago by Michael 55k