Forum:Suffix Arrays: looking for an instructive algorithm of SA construction for teaching
Entering edit mode
3.4 years ago
Michael 53k

Within a lecture series, I am planning to delve into suffix arrays as a versatile data structure for bioinformatics. In particular, I would also like to show an efficient algorithm for their construction (sorting, LCP construction). The algorithm should be of linear time complexity (O(n)), need not be linear space complexity. The algorithm should be easy enough to depict at least the basic ideas in a comprehensible way. I have looked into some recent ones, e.g. Li et al. (2016) and a paper on DivSufSort. However these papers are very technical (libDivSufSort >1000 LOC), and I don't think they are suitable for presentation in a single class.

Do you have any ideas? Presenting the basic principle in an instructive fashion would possibly be good enough. This is an undergrad (MSc-level) course in "genome scale algorithms".

A little update here: I am looking now into the DC3 algorithm (only ~50 lines of code, not trivial but relatively straight forward, I think I almost got it), and SA-IS (~100 LOC). If someone knows about some good slides on those, that will be highly appreciated.

Also, I don't want to use an algorithm that starts from suffix-trees,because these have little practical relevance.

suffix-arrays linear-time teaching algorithm • 875 views
Entering edit mode

Thanks for the links, they look good for self-study, however, I should maybe also specify that I was looking for an algorithm that does not require building from a suffix tree first, because these have large space requirements, and, in fact, I have not introduced trees first.


Login before adding your answer.

Traffic: 2786 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6