Happy New Year! I'm struggling to understand how in this example they can determine the starting/ending of the sequence when the deBuijn graph ends up being circular. For example, how do you know the sequence is ATGCTAGCAC vs TGCTAGCACA? Can you differentiate this from a circular deBuijn graph? Thanks!
No, you can't differentiate. A DBG can have multiple Eulerian paths that are consistent with it (aka multiple assemblies are possible with the above graph). Longer k-mers alleviate this problem somewhat.
For assembly, you look at contigs since those stretches at least are unambiguous.