Salmon Algorithm Binary Search
1
0
Entering edit mode
5.1 years ago
pablo ▴ 280

Hi guys,

"In SA, we matched successively longer prefixes (left-to-right) of query string (binary search)

In BWT, we will match successively longer suffixes (right-to-left) of query string (reverse BWT transform)"

So, for Salmon's algorithm which uses a SA, it is doing a binary search? I get confused when I read that

Best, Vincent

salmon • 1.1k views
0
Entering edit mode

tagging: Rob

3
Entering edit mode
5.1 years ago
Rob 6.2k

Hi Vincent,

The answer to your question is yes --- the mapping algorithm uses binary search --- but with two very important twists. First, it's important to note that the mapping algorithm augments the suffix array with a hash table. This hash table stores, for each k-mer, the _interval_ of the suffix array where suffixes beginning with this k-mer as a prefix begin. Recall that since suffixes in the SA are sorted lexicographically, all suffixes starting with a given k-mer will be contiguous in the SA. This means that instead of the binary search being O(M lg N), it is O(M lg W) where W is the width of this suffix array interval. For sufficiently long k-mers, this leads to intervals which typically have a very short width. Second, the mapping algorithm uses the "simple accelerant" (as suggested in Gusfield), which remembers the size of the prefix which must already be matched as we conduct the search --- this leads to a practical algorithm that works like O(M + lg W) in most cases (though, we don't use the LCP array so it's technically O(M lg W) in the worst case). The main benefit of using the uncompressed suffix array like this is that these modifications make search very fast, and since the suffix array is uncompressed, both count and locate queries are extremely efficient.

0
Entering edit mode

So, Salmon doest not use a BWT algorithm, and just uses an uncompressed SA ? Whereas HISAT uses a SA to build a BWT matrix to search pattern ?

0
Entering edit mode

Salmon (in mapping based mode), uses an uncompressed SA aided by a hash table. HISAT uses a hierarchical FM-index (itself based on the BWT and the suffix array).

Traffic: 2029 users visited in the last hour
FAQ
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.