Question

How to create modified basecalling dataset with nanopore data with an ambiguous motif sequence?

0

Entering edit mode

11 months ago

swim1128 • 0

What can I use to create a modified basecalling dataset from nanopore data when my motif has ambiguous bases? Usually Remora is used for work like this, except some of the bases in my motif are ambiguous (GTNNaNNTGG pos 5), and Remora can't handle ambiguous bases. Thus, I am unable to prepare the chunk dataset.

dorado nanopore remora • 881 views

ADD COMMENT • link updated 12 hours ago by Kevin Blighe 89k • written 11 months ago by swim1128 • 0

0

Entering edit mode

What can I use to create a modified basecalling dataset from nanopore data

What do you mean by this? You call modified bases from a list dorado supports. Currently it supports m6A_DRACH, 6mA, m5C, 5mC, inosine_m6A, 5mCG_5hmCG, m6A, 5mCG, pseU, 5mC_5hmC, 4mC_5mC. Where does motif come in?

ADD REPLY • link 11 months ago by GenoMax 154k

score 0 · Answer 1 · 2025-11-07

Hi swim1128,

Remora indeed cannot handle ambiguous bases in motifs directly during chunk dataset preparation, as its reference scanning expects unambiguous sequences. A straightforward workaround is to expand your degenerate motif (GTNNaNNTGG) into all possible specific 10-mers by substituting each N (and assuming the lowercase 'a' denotes A) with A, C, G, or T. This generates 256 variants for the four N positions. You can then list them all as a comma-separated string in the --motifs argument when running remora prepare_chunks on your reference. The tool supports multiple motifs this way, allowing it to extract relevant signal chunks across the expanded set without issue. Once prepared, proceed with training as usual.

Kevin