K-Mer Counting And Constructing Bwt Index In String Graph Assembler (Sga)
1
1
Entering edit mode
8.5 years ago
ugly.betty77 ★ 1.1k

While going through the code of string graph assembler (SGA) and especially Heng Li's implementation of BCR algorithm for fast indexing of sequence libraries (https://raw.github.com/lh3/ropebwt/master/bcr.c), I came to realize that constructing Burrows Wheeler transform in SGA has many similarities with k-mer counting in de Bruijn graph-based genome assembler. In my naive understanding of BWT (but not BCR code), constructing BWT index is likely to take more time in regions with highly repetitive k-mers. Does anyone has further insight into the process? Are these kind of code and algorithm-related questions appropriate for Biostar?

• 3.3k views
ADD COMMENT
3
Entering edit mode

Yes, bioinformatics code and algorithm questions are entirely appropriate. It helps if questions are specific, "further insight into the process" is rather vague.

ADD REPLY
1
Entering edit mode
8.1 years ago
Ketil 4.1k

Although I'm not an expert on this, constructing the BWT is essentially the same as constructing a suffix array. Since there are linear time algorithms for this, I don't think repetitive regions will necessarily be slower. But it is possible that there are other algorithms that are used because they are faster in practice, but slow down in specific cases (much like quicksort).

ADD COMMENT
0
Entering edit mode

"Since there are linear time algorithms for this"

Thank you Ketil. I do not think the matter is that simple. Constructing BWT for a large read library is not trivial and a number of papers came out last year, mostly from Anthony Cox and colleagues. Heng Li released his ropebwt2 algorithm a week back related to the same topic, but I have not got time to study it in enough detail to comment.


ADD REPLY

Login before adding your answer.

Traffic: 2356 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6