Should I use the basecaller Dorado to analyze my Nanopore Data?
2
0
Entering edit mode
4 months ago

I am working with identifying bacterial communities in the fruit fly gut. In October we sequenced the data using Nanopore Technology and the results showed that there were no families to a specific type of bacteria that we have studied in the lab for a while. I decided to re analyze the data myself, but I am confused as to what would be the correct way to analyze it. Apparently Nanopore has a new basecaller called Dorado, I have converted my fast5 files to Pod5 to be able to use this basecaller. I searched up info online and the output is supposed to be a .cram file.

Should I change the cram files to fasta or fastq and then import them to qiime2 for taxonomic classification or visualization or should I just ignore the basecaller and try to clean it using other tools and then importing it to qiime2?

To clean the data I would use the following:

  1. Remove the adapters using PoreChop
  2. Trim and remove the reads with NanoFilt
  3. Filter sequences with fastp
Dorado Nanopore • 2.3k views
ADD COMMENT
0
Entering edit mode

You can just use guppy to do basecall and get fastq output, no need to use dorado.

ADD REPLY
1
Entering edit mode
4 months ago
GenoMax 141k

Dorado is now the preferred basecaller. Output of dorado is an unaligned BAM file through you can also get it to emit fastq data (if you prefer that).

ADD COMMENT
1
Entering edit mode

As a side note if you're using one of the base modification models I've found it easier to have dorado do the alignment also. It lets you use the aligned bam as a direct input to modkit.

ADD REPLY
0
Entering edit mode

I will keep this in mind! Thank you. Also if I have the fastq files that were provided by nanopore, is it better for me to just clean the fastq files or do you recommend using the raw data and starting from the basecalling?

ADD REPLY
1
Entering edit mode

Dorado is now the default caller for MinKNOW. If a recent version was used then it is possible that dorado was already used. In that case you can start working with fastq directly.

ADD REPLY
0
Entering edit mode

Thank you for the reply! I am reading over the dorado documentation and do not see an option that woud let me emit it as fastq data. Do you perhaps have a link or the code that I can use?

ADD REPLY
1
Entering edit mode
dorado basecaller --emit-fastq model_file POD5_folder > file.fastq

With v. 0.4.3. I see that 0.5.0 is out. This software is frequently updated.

ADD REPLY
0
Entering edit mode
3 months ago

As your sequencing was done in October, you won't get much value out of re-basecalling. It makes sense to re-basecall if your data are much older, eg I got much better (~4% better pc identity) data re-basecalling 2021 data in 2023 with the appropriate model.

But re-basecalling 3 months later likely makes little sense.

I would focus my efforts on using other metagenomic long read binners with different databases behind them to find your bacteria of interest, if present.

Is this 16S or whole metagenome data ? For 16S try https://gitlab.com/treangenlab/emu

ADD COMMENT
0
Entering edit mode

The "appropriate model" part is an important point. It'd be worth checking whether the reads were basecalled with a HAC or SUP model vs just a fast model.

ADD REPLY

Login before adding your answer.

Traffic: 1614 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6