The orientation of short reads on building de-Bruijn Graph
1
0
Entering edit mode
6.1 years ago
934963534 ▴ 20

I know a node can refer to the orignal kmer and its forward-reverse kmer, but how to deal with the problem that these short reads can also be in different orientation to the reference. For example, given genome reference AAACCT, should ACCT(TGGA)(forward) and TCCA(AGGT)(backward) considered also a same node in the de-Bruijn graph? Or just divided them into two seperated node?

de-Bruijn graph sequence assembly • 1.3k views
ADD COMMENT
1
Entering edit mode
6.1 years ago

Generally one uses the "canonical k-mer" when making de Bruijn graphs. This is typically whichever of the k-mers comes first in the alphabet (or numerically first if you're representing them as numbers). So in your example ACCTTGGA would be stored. You'll have to account for this when traversing the graph, of course.

ADD COMMENT
0
Entering edit mode

So that's to say when traversing the graph, a node actually represents four conditions(forward(ACCT), backward(TCCA), forward-reverse(TGGA), backward-reverse(AGGT))? Would it cause more branches?

ADD REPLY
0
Entering edit mode

I assumed you had 8-mers. A node never represents its reverse, it's either the sequence or its reverse complement.

ADD REPLY

Login before adding your answer.

Traffic: 2044 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6