Question

How to convert raw Nanopore R9 fast5 files to fastq files ?

0

Entering edit mode

4 months ago

Lélé ▴ 10

Hello,

I am new at processing Nanopore sequencing data and am having an issue:

I have binary fast5 files directly out of the Nanopore sequencer (R9 Flowcell) and I would like to use Dorado to perform basecalling as it seems to be the preferred tool. I have already used dorado to do simplex basecalling on pod5 files using argument "hac" for the model as it says on the nanoporetech/dorado GitHub page.

However, I can't seem to make it work on fast5 files even though the documentation says that it is supported for simplex basecalling (though less performant). This is what I type in my terminal:

$ dorado basecaller hac /directory/to/my/fast5/files --emit-fastq > output.fastq

I keep getting this error:

[error] Cannot automate model selection using fast5 files

I have tried using "fast" or "sup" instead of "hac" in case it would make a difference but to no avail.

Is there a specific model I should use or download ? Any other tools you could recommend for basecalling from fast5 files ? I know about guppy however I am unable to download it as it is an ONT tool.

Any help would be greatly appreciated.

Thanks,

Lele

dorado fast5 basecalling nanopore • 1.5k views

ADD COMMENT • link updated 4 months ago by GenoMax 144k • written 4 months ago by Lélé ▴ 10

0

Entering edit mode

4 months ago

Dave Carlson ★ 1.8k

Your best bet is probably to convert your fast5 files to pod5. This can be accomplished with one of the pod5 tools:

https://pod5-file-format.readthedocs.io/en/latest/docs/tools.html#pod5-convert-fast5

ADD COMMENT • link 4 months ago by Dave Carlson ★ 1.8k

0

Entering edit mode

Thank you for your help. However I was hoping to compare both techniques: R9 flowcell which is in fast5 format and R10 in pod5. Wouldn't converting fast5 into pod5 be a bias for the comparison ?

ADD REPLY • link 4 months ago by Lélé ▴ 10

0

Entering edit mode

I don't think so. POD5 is a file format (like fast5) and it is much more efficient with dorado. I don't recall the exact number but there was a speed up of several fold when using a GPU to rebasecall.

ADD REPLY • link 4 months ago by GenoMax 144k

score 1 · Accepted Answer · 2024-03-01

1

Entering edit mode

4 months ago

Carlo Yague 8.7k

According to the documentation:

Dorado can automatically select a basecalling model using a selection of model speed (fast, hac, sup) and the pod5 data. This feature is not supported for fast5 data. If the model does not exist locally, dorado will automatically downloaded the model and delete it when finished. To re-use downloaded models, manually download models using dorado download

So in your case, try downloading the model locally first using

dorado download --model all

then do the basecalling.

dorado basecaller hac@latest /directory/to/my/fast5/files --emit-fastq > output.fastq

ADD COMMENT • link 4 months ago by Carlo Yague 8.7k

0

Entering edit mode

Thank you for your reply. I have just tried this as well as using hac@v4.2.0 however I still get the same error... Guess I'll have to figure out another way.

ADD REPLY • link 4 months ago by Lélé ▴ 10

2

Entering edit mode

Have you tried specifying exactly the model ? For instance dna_r9.4.1_e8_hac@v3.3 for R9 flowcell.

ADD REPLY • link 4 months ago by Carlo Yague 8.7k

0

Entering edit mode

Just replaced hac by the full name of one of the downloaded model and it works. Silly mistake on my part as hac@latest still automatically chooses a model for you as it says in the model complex table.

Thanks again !

ADD REPLY • link 4 months ago by Lélé ▴ 10