Question: Processing Raw Nanopore Data
1
gravatar for anastasiia.satyr
6 weeks ago by
Poland/Poznan/Institute of Bioorganic Chemistry
anastasiia.satyr20 wrote:

Hello,

I performed MinION sequencing run using MinKNOW software,and 4 days after sequencing finish it had only 24% of bases called. So, I decided to do base calling on more powerful computer with using Guppy. But I am not sure, which data I should take as input? Input file format for Guppy is .fast5.

I suppose that .fast5 data are generated after base calling, and if I continue base calling via MinKNOW, it will work about 2 weeks till completion.

I also found "queued_reads" folder in my data, and it contains 512 folders (which I suppose come from 512 channels on MinION flowcell), and these folders contain files of .raw format. Is there any software to continue base calling based on these .raw files as input? Or the only way is to wait until 100% .fast5 files are generated?

Thank you, Anastasiia

ADD COMMENTlink modified 6 weeks ago by WouterDeCoster40k • written 6 weeks ago by anastasiia.satyr20
4
gravatar for WouterDeCoster
6 weeks ago by
Belgium
WouterDeCoster40k wrote:

Hi Anastasiia,

Except if you have an application which benefits from a realtime analysis I would recommend to turn the live basecalling off and, indeed, use a more powerful computer (ideally with GPUs) for guppy basecalling. If you turn live basecalling off you should directly get fast5 format, and no raw intermediate. You are not the first to have issues with .raw files, and it is a problem that was only recently fixed. Do you have access to the nanopore community forum? If so, see the following thread: https://community.nanoporetech.com/posts/how-to-convert-raw-to-fa One hour ago, Andrew Goodall posted there some guidelines on how to "recover" the .raw files.

Cheers,
Wouter

ADD COMMENTlink modified 6 weeks ago • written 6 weeks ago by WouterDeCoster40k

Hi Wouter,

I did as you adviced and now I am on a base calling stage. So, my next question is quite specific: how to use the guppy tool with the best performance of the computer? I have linux system with 16 cores and 2 threads per core (and 125 Gb RAM). How to specify GPU base calling? During the run I put an optional command --cpu_threads_per_caller 16; but I came to work next day and found that only 10% of the process had been finished. Moreover, I found in Guppy documentation that "if GPU base calling is run, modification of number of CPU threads per caller is not effective. If so, is there any other possibility to increase the base calling speed?

Many thanks, Anastasiia

ADD REPLYlink written 6 weeks ago by anastasiia.satyr20

Do you have access to the nanopore community forum? More about guppy can be found there.

For running on GPU you need to set the --device parameter, but I'm not sure how you should do that correctly on your system. If I basecall stuff it is on the PromethION and there it is --device "cuda:0 cuda:1 cuda:2 cuda:3".

ADD REPLYlink written 6 weeks ago by WouterDeCoster40k

The only thing I managed to find on Community is the documentation for Guppy, but I guess I should start from settings of my video card, to run the GPU base calling. Anyway, thanks a lot for answer!

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by anastasiia.satyr20

Can you tell us about the GPU you have in your system?

If it doesn't have an adequate GPU and you are forced to do CPU basecalling then i would recommend using guppys "fast" config. CPU basecalling with the default config takes days to weeks on a single machine.

ADD REPLYlink written 6 weeks ago by Tom430

Yep, I do high accuracy mode calling on a slurm cluster since I don't have a GPU which will work with Guppy. It takes 6-7 times as long as the fast mode calling. We need a GPU and or Minit, if you massively split the input it is quite quick on a cluster too.

ADD REPLYlink written 5 weeks ago by colindaven1.6k

I have NVIDIA GeForce 1030 and I've already realized that my GPU is not adequate for Guppy base calling Could you tell me how to specify this "fast" config? Because I didn't manage to find this in Guppy docuentation. Moreover, I have part of files already basecalled through MinKNOW so I'm afraid that output data would be different from the rest of files base called in another manner.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by anastasiia.satyr20
1

Try :

guppy_basecaller --print_workflows

I get a lot of different workflows. Eg.

# high accuracy
FLO-FLG001 SQK-LSK108           dna_r9.4.1_450bps_hac
#fast (I believe)
FLO-MIN107 SQK-RAB204 included  dna_r9.5_450bps

In my SLURM script I have the following:

echo "Input directory: " $1
i=$1

# Add miniconda3 to PATH
. /mnt/ngsnfs/tools/miniconda3/etc/profile.d/conda.sh

# Activate env on cluster node
conda activate


### Run command - directory
# high accuracy, 7x + slower (40+ hours)
#guppy_basecaller -i $i  -s $i.guppy --cpu_threads_per_caller 1 --num_callers $cpus -c dna_r9.4.1_450bps_hac.cfg
# fast, lower accuracy, 7x + faster (6hours?)
guppy_basecaller -i $i  -s $i.guppy --cpu_threads_per_caller 1 --num_callers $cpus -c dna_r9.4.1_450bps_fast.cfg
ADD REPLYlink modified 4 weeks ago • written 5 weeks ago by colindaven1.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 959 users visited in the last hour