Question

Forum:Suffix Arrays: looking for an instructive algorithm of SA construction for teaching

0

Entering edit mode

4.0 years ago

Michael 54k

Within a lecture series, I am planning to delve into suffix arrays as a versatile data structure for bioinformatics. In particular, I would also like to show an efficient algorithm for their construction (sorting, LCP construction). The algorithm should be of linear time complexity (O(n)), need not be linear space complexity. The algorithm should be easy enough to depict at least the basic ideas in a comprehensible way. I have looked into some recent ones, e.g. Li et al. (2016) and a paper on DivSufSort. However these papers are very technical (libDivSufSort >1000 LOC), and I don't think they are suitable for presentation in a single class.

Do you have any ideas? Presenting the basic principle in an instructive fashion would possibly be good enough. This is an undergrad (MSc-level) course in "genome scale algorithms".

A little update here: I am looking now into the DC3 algorithm (only ~50 lines of code, not trivial but relatively straight forward, I think I almost got it), and SA-IS (~100 LOC). If someone knows about some good slides on those, that will be highly appreciated.

Also, I don't want to use an algorithm that starts from suffix-trees,because these have little practical relevance.

suffix-arrays linear-time teaching algorithm • 1.0k views

ADD COMMENT • link updated 10 months ago by Ram 43k • written 4.0 years ago by Michael 54k

0

Entering edit mode

https://www.geeksforgeeks.org/pattern-searching-using-suffix-tree/
https://www.geeksforgeeks.org/suffix-array-set-1-introduction/

ADD REPLY • link 4.0 years ago by GenoMax 141k

0

Entering edit mode

Thanks for the links, they look good for self-study, however, I should maybe also specify that I was looking for an algorithm that does not require building from a suffix tree first, because these have large space requirements, and, in fact, I have not introduced trees first.

ADD REPLY • link 4.0 years ago by Michael 54k