Why does Guppy output different sequence data (same model)?
1
0
Entering edit mode
12 months ago

I ran a nanopore sequencing on mk1c device with live basecalling and obtained some fastq in fastq_pass, fastq_fail folders. I tried to rerun the basecalling in a different machine, but found that they produce different sequences in just a test case, e.g.:

In the fastq basecalled in mk1c

> @b2e79451-050f-4d74-b091-40e6b6ee2229 runid=43a904bc3c628e3d9e32355643c0236ba632c012 read=110919 ch=21 start_time=2023-0
4-13T14:22:57.269028+00:00 flow_cell_id=FAR89176 protocol_group_id=STARRS sample_id=no_sample barcode=barcode01 barcode_
alias=barcode01 parent_read_id=b2e79451-050f-4d74-b091-40e6b6ee2229 basecall_model_version_id=2021-05-17_dna_r9.4.1_mini
on_384_d37a2ab9
ATTTATCCTTGTACTTCCAGTTGCAGTAGGTGTTTAACCAGAAAGTTGTAAGTGTCGCTGTGGTTTTCGCATTTATCGTGAAAACGCTTTCGCGTTTTTCGTGCGCCGCTTCAGTATTTG
AAATCTTTATATCTTGATTAATTTCATTTCCGTTTGAAATTGCTGATTTGTTGTCTAACTTTAAACTTGTGTCCGATGTTTTTTAACAGCACCTTCATTTTTATTTTGTCTTTTGTCGTA
TTTTTATTAGCATTTAA

And when I rebasecalled it in my workstation:

> @b2e79451-050f-4d74-b091-40e6b6ee2229 runid=43a904bc3c628e3d9e32355643c0236ba632c012 sampleid=no_sample read=110919 ch=21 start_time=2023-04-13T04:52:57Z model_version_id=2021-05-17_dna_r9.4.1_minion_384_d37a2ab9
ATTTATCCTTGTACTTCCAGTTGCAGGTAGGTGTTTAACCAGAAAGTTGTAAGTGTCGCTGTGGTTTTCGCATTTATCGTGAAAACGCTTTCGCGTTTTTCGTGCGCCGCTTCAGTATTTGAAATCTTTATATCTTGATTAATTTCATTTCCGTTTGAAATTGCTGATTTGTTGTCTAACTTTAAACTTGTGTCCGATGTTTTTTAACAGCACCTTCATTTTTATTTTGTCTTTTGTCGTATTTTTATTAGCATTTAA

And while the specific fast5 that I tested was from the fast5_pass folder, and I can find the read and run_id in the fastq_pass file, the sequence in the example was put into fastq_fail folder. There are also some examples of sequences where even the length differ for a little bit, so my question is:

  1. am I doing something wrong? I have:

    • set the model used for basecalling to be the same one (--flowcell "FLO-MIN106" --kit "SQK-RBK110-96" with high accuracy)
    • seq the min_qscore to be the same

      The only difference I can find is the version of guppy basecaller :

      • in mk1c: Version 6.4.6+ae70e8fa0, minimap2 version 2.24-r1122
      • in workstation: Version 6.4.8+31becc9, minimap2 version 2.24-r1122

2: Given the situation, should I just (1) rebasecall everything with the newer version of guppy or (2) basecall my fast5_skip and use them in combination with the existing basecalled data?

guppy Nanopore • 912 views
ADD COMMENT
0
Entering edit mode
12 months ago

If you have a GPU, just re-basecall everything with SUP accuracy with the most recent Guppy.

It is impossible to say from your evidence of just one read what is going on here. You want to think about read distributions, Q values, mapped accuracy(cramino is good for this), file sizes, not single reads, when comparing nanopore basecalling or runs.

ADD COMMENT

Login before adding your answer.

Traffic: 2351 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6