4 months ago
matt81rd

I am trying to assess the quality of some metagenomic ONT data. I was suggested to use NanoPlot to assess the 'quality' of the data but don't really understand the output. I know the higher the quality score the higher the probability that a particular decision is correct.

However i have two ONT datasets that are identical apart from the type of sequencing kit used. They have different quality score but the one with higher quality score has a lot lower total reads, does this impact the quality score?

E.g. data1 = quality score 11 with 500,000,000 total reads. data2 = quality score 12 with 100,000,000 total reads.

Also could someone explain to me how quality score conveys to percent accuracy. I have seen that a quality score of 10 correlates to a percent identity of 90. How would i work out what a quality score of 12 correlates to?

I also haven't filtered or trimmed the reads before using nanoplot is this something i need to do before using nanoplot?

None of this is specific to NanoPlot, but the scores you are mentioning are Phred scale quality scores. Reading about that might clear up some confusion. Each nucleotide that is sequenced gets a score, independent of the number of reads.

It is also probably helpfull if you show the output that confuses you.

For comparing samples/experiments I recommend NanoComp.

Yes apologies you are right i'm not being very specific.

I guess what i'm looking for is i have seen nanoplot being used to create a plot comparing the per read average basecall quality versus its percent identity. I'm not sure how you produce that using nanoplot as its not a plot created by default, does a plot like that require BAM files (as i only have the ONT fastqs currently).

Thank you for the link and NanoComp will give them a look :)

Yes, percent identities are calculated from the alignment to the reference genome.


