How to create modified basecalling dataset with nanopore data with an ambiguous motif sequence?
1
0
Entering edit mode
11 months ago
swim1128 • 0

What can I use to create a modified basecalling dataset from nanopore data when my motif has ambiguous bases? Usually Remora is used for work like this, except some of the bases in my motif are ambiguous (GTNNaNNTGG pos 5), and Remora can't handle ambiguous bases. Thus, I am unable to prepare the chunk dataset.

dorado nanopore remora • 881 views
ADD COMMENT
0
Entering edit mode

What can I use to create a modified basecalling dataset from nanopore data

What do you mean by this? You call modified bases from a list dorado supports. Currently it supports m6A_DRACH, 6mA, m5C, 5mC, inosine_m6A, 5mCG_5hmCG, m6A, 5mCG, pseU, 5mC_5hmC, 4mC_5mC. Where does motif come in?

ADD REPLY
0
Entering edit mode
12 hours ago

Hi swim1128,

Remora indeed cannot handle ambiguous bases in motifs directly during chunk dataset preparation, as its reference scanning expects unambiguous sequences. A straightforward workaround is to expand your degenerate motif (GTNNaNNTGG) into all possible specific 10-mers by substituting each N (and assuming the lowercase 'a' denotes A) with A, C, G, or T. This generates 256 variants for the four N positions. You can then list them all as a comma-separated string in the --motifs argument when running remora prepare_chunks on your reference. The tool supports multiple motifs this way, allowing it to extract relevant signal chunks across the expanded set without issue. Once prepared, proceed with training as usual.

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 5219 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6