Question

Discrepancy in Q-score assessment of ONT reads in Nanopore and third-party software repors

0

Entering edit mode

5 hours ago

k-tarasov ▴ 10

Hello! We perform 16S profiling of microbial community. We sequenced the library of PCR-amplified 16S sequences on R10.4.1 pores and basecalled the reads with Dorado. The problem is that when I get the statistics of same fastq file with different software, I get different results. 1) When the data is analysed by Nanopore software (either with 16s-wf pipeline which generates read quality statistics in the final report or with NanoPlot), I get average read Q-score of ~21.

2) When the data is analysed with fastqc + multiqc or seqkit, I get average read Q-score >30.

seqkit output:

file              format  type   num_seqs        sum_len  min_len  avg_len  max_len     Q1     Q2     Q3  sum_gap    N50  Q20(%)  Q30(%)
all_raw.fastq.gz  FASTQ   DNA   2,429,312  3,492,160,612        1  1,437.5   79,704  1,461  1,495  1,504        0  1,496    86.3   75.34

Where does such huge descrepancy come from?

I also asked this question on github, there are pictures from fastqc + multiqc and wf-16s pipeline reports there:

https://github.com/epi2me-labs/wf-16s/issues/39

Thank you!

seqkit q-score fastqc 16s-wf nanoplot • 106 views

ADD COMMENT • link 4 hours ago by k-tarasov ▴ 10

0

Entering edit mode

Are you only looking at reads that satisfy this filter (which is in your command line in GitHub post) in your seqkit and fastqc reports?

--min_len 1400 --max_len 1600 --min_read_qual 10

With this filter do the read numbers going into the two programs match?

ADD REPLY • link 5 hours ago by GenoMax 154k

0

Entering edit mode

Thank you for answer. No, I am looking at all reads before appllying any filters.

There are 2,429,312 raw reads, and in 16s-wf pipeline I got after applying filters --min_len 1400 --max_len 1600 --min_read_qual 10 a total of 2,049,340 reads.

ADD REPLY • link 5 hours ago by k-tarasov ▴ 10

1

Entering edit mode

5 hours ago

colindaven 8.0k

I agree completely. I think the cause is well discussed here : https://github.com/OpenGene/fastplong/issues/20

fastplong currently still appears to have this problem though, so I would trust the nanoplot or chopper results more. This issue will have big effects on Q score filtering prior to assembly or alignment.

ADD COMMENT • link 5 hours ago by colindaven 8.0k

score 2 · Accepted Answer · 2025-10-15

Thanks to colindaven answer, I managed to travel by hyperlinks to the source of the discrepancy. It seems that Nanopore tools compute average Q-score in the following way: they convert individual Q-scores to error rates, then compute the average error and then convert average error back to some Q-score value. This Q-score value is considered the average Q-score. Unlike Nanopore tools, third-party tools such as FastQC, seqkit, fastp, fastplong - as far as I know, just compute the average Q-score summing across all Q-scores and dividing by the total number of reads. This gives overestimated Q-score.

Useful links to read more about it:

https://gigabaseorgigabyte.wordpress.com/2017/06/26/averaging-basecall-quality-scores-the-right-way/

https://community.nanoporetech.com/posts/what-is-the-base-value-for

Dorado lines where mean Q-score is computed:

https://github.com/nanoporetech/dorado/blob/a7fb3e3d4afa7a11cb52422e7eecb1a2cdb7860f/dorado/utils/sequence_utils.cpp#L132