Question: Are there advantages in using high accuracy models for guppy basecaller when Illumina data are available?
0
gravatar for ikangkim
15 months ago by
ikangkim50
ikangkim50 wrote:

Hi,

I'm planning to sequence bacterial genomes using both Nanopore and Illumina platforms to get nearly complete and accurate genomes. After getting sequence data, I'm going to perform hybrid assembly (e.g. Unicycler) or long-read assembly followed by short-read polishing.

In this case, can I get some advantages by using high accuracy models (e.g. dna_r9.4.1_450bps_hac.cfg) for guppy basecalling? Or, would fast models (e.g. dna_r9.4.1_450bps_fast.cfg) be enough?

I'm testing the speed performance of guppy on my Ubuntu 18.04 machine equipped with GTX1660 (Cuda 10.1), and it seems that fast models are much faster than high accuracy models (>10X).

Thanks.

sequencing assembly genome • 1.2k views
ADD COMMENTlink modified 15 months ago • written 15 months ago by ikangkim50
1
gravatar for colindaven
15 months ago by
colindaven2.6k
Hannover Medical School
colindaven2.6k wrote:

Bacterial genomes seem to have a higher accuracy than vertebrate genomes in my limited experience (maybe fresher, more higher quality DNA?) in both fast and hac modes.

I would do hac mode followed by Illumina polishing, Maybe you only get a 1% accuracy increase with respect to fast mode, but 1% is worth having and is going to cause a LOT less problems downstream.

My speed comparisons indicate a ~7X difference between fast and hac modes on CPU.

ADD COMMENTlink written 15 months ago by colindaven2.6k

Thank you for a reply.

I also feel that hac models would be better for downstream analyses. I think I had better spend more time to optimize guppy parameters. Currently, hac model is >10X (GPU) or >20X (CPU) slower than fast model on my machine.

ADD REPLYlink written 15 months ago by ikangkim50

I haven't optimised at all really yet apart from the obvious (CPU only).

I might be wrong as I haven't done comparative analysis of different setups, but we should do more testing.

I split the fast5 files into groups of 5 per subdir, then submit each folder to a slurm cluster. I specify 10 slurm threads but set $cpus in the code below to 8.

The code is at https://github.com/colindaven/guppy_on_slurm

# high accuracy, 7x + slower (40+ hours)
guppy_basecaller -i $i  -s $i.guppy --cpu_threads_per_caller 1 --num_callers $cpus -c dna_r9.4.1_450bps_hac.cfg
# fast, lower accuracy, 7x + faster (6hours?)
# guppy_basecaller -i $i  -s $i.guppy --cpu_threads_per_caller 1 --num_callers $cpus -c dna_r9.4.1_450bps_fast.cfg

If you can optimize further please let me know.

ADD REPLYlink modified 15 months ago • written 15 months ago by colindaven2.6k

I've been trying to optimize several parameters for guppy.

Until now, the setting below was the fastest for GPU, but the speed improvement was just ~20-25% compared to default.

$ guppy_basecaller -i /fast5 -s /guppy -c dna_r9.4.1_450bps_hac.cfg -x "cuda:0" --gpu_runners_per_device 4 --num_callers 4 --chunks_per_runner 2048

I haven't tried different settings for CPU, because GPU with default was a little bit faster than CPU even when I used 72 threads (among 80 threads available from dual Xeon Gold 6230). Unfortunately, I have no access to a cluster.

ADD REPLYlink modified 15 months ago • written 15 months ago by ikangkim50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1606 users visited in the last hour
_