I was wondering if anybody can explain how the FM-index was created for multiple entries of a sequence file (say, fasta format for a reference genome containing multiple chromosomes). In other words, how does the FM-index distinguish different sequences in a single fasta file?
While I was studying the burrows wheeler transformation (BWT) and FM-index on sequence manipulation, I seem to understand the idea of the BWT and FM-index for single string/sequence.
1) All the codes from github I can find are using single string/sequence as examples including the original FM index paper.
2) It seems to me a common strategy is to concatenate the sequences to get a single one, which brings to my original question: how does the FM-index distinguish the original entries? Remember the offset when they are joined? 3) The two programs bwa and bowtie2 are two complicated for me to understand the details.
I am trying to understand the implementation of FM-index in C/C++ to create the FM-index for sequences/strings. Thanks in advance.