Alignment Algorithm
1
5
Entering edit mode
10.7 years ago

Hi all, I'm playing with an alignment algorithm, that should look like BLAT (I think).

From the Reference sequence, I've sorted all the words of size=K

AAAAAAACT  index in ref=1901
AAAAAAACT  2504
AAAAAACTT  2505
(...)
TTTTTTACC  189
TTTTTTTTA  87

when we want to map a sequence on the chromosome, the query is also indexed

 AAAAAAACG  20
 AAAAAACGA  21   
 (...)
 TTAATGGAG  4

so, the words of the query can be quickly mapped to the reference using a binary-search. At the end, I got an array of hits:

struct Hit
  {
  int position_in_ref;
  int position_in_query;
  int word=K;
  }

that should look like this (the figure is from the BLAT paper)

blat paper

Now, how should I use this information to get the best alignment(s). I suspect a simple solution should be something like a dynamic algorithm but I'm lost at this point. Any suggestion ?

algorithm alignment pairwise • 3.4k views
ADD COMMENT
1
Entering edit mode

by "alignments", do you mean gapped or ungapped?

ADD REPLY
0
Entering edit mode

gapped alignments

ADD REPLY
6
Entering edit mode
10.7 years ago

So you've done your search stage and obtained candidate alignment seeds. Now you need to choose groups to join and fill the gaps. How to select groups of seeds? BLAT merges overlapping seeds to start with. Then you could look for groups that form chains within a certain diagonal band. You can find the band by subtracting the query base coordinate from the reference base coordinate. Then sort everything by band and the chains will be grouped together. I think BLAT recurses at some point, to further join these seeds by repeating the Kmer finding operation locally, with smaller and smaller Kmers.

Or, if you don't want to recreate BLAT exactly, you can use dynamic programming (e.g. Smith Waterman) to stitch them together. If your seeds are good enough, you could assume they are true and use DP to join them up, or you could go for it and DP re-align the whole area to get the best possible result, using banded alignment. A banded alignment means that you can ignore the DP cells outside your diagonal band and speed up the calculation. When you say "best" alignment, that implies to me a rigorous search i.e. SW, but maybe I've misunderstood that point.

ADD COMMENT

Login before adding your answer.

Traffic: 2287 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6