Help me understand the Nanopore fastqc results
2
0
Entering edit mode
8 weeks ago
Assa Yeroslaviz ★ 1.7k

Hi,

I have got my first Nanopore sequencing data and the first step was to see if the data is good. Has anyone has any experience with this kind of data and can tell me how to interpret the results.

The whole report can be downloaded here (not sure how to post it here). Allin all it looks quite good to me, but what I'm not sure about are the two images attached here. These are the per base QC and sequencing content. It seems that the beginning of the reads is not good, but this can be probably trimmed by removing adapters etc. But Is it really that i need to remove the first 1000 positions? This seems a bit extreme.

About the sequencing content, I don't know where to begin. This looks consistent with the first image, where the quality is not good, and if removing these positions, it should get better. but is it ok for the two pairs TG and AC to be so much apart like that?

EDIT:

I have done both pycoQC as well as the minionQC run (R). They show similar results. It can be downloaded from here a few of the images are also attached.

Basecalled reads PHRED quality Output over experiment time Basecalled reads length vs reads PHRED quality Channel activity over time

To me it looks as if the run was past its prime after ~50h. There is no real gain of new reads afterwards. As I'm usually work with mRNA-Seq, I'm not sure how to call the PHRED Quality. I can see that the most reads are of Q ~=8 and most of them are short (which is also expected). But all in all can this run be classified as good?

thanks

per base QC per base Seq content

long-read-seq nanopore fastqc • 470 views
ADD COMMENT
3
Entering edit mode

Please use a program that is more suitable for QC'ing Nanopore data. PycoQC (LINK) and Nanoplot (LINK). And then show us what those reports look like.

ADD REPLY
0
Entering edit mode

I have edited the post and added results from pycoQC.

ADD REPLY
0
Entering edit mode

It would help to clarify what kind of data this is. You have an assessment from @Trivas below but in some contexts this may be an expected outcome.

ADD REPLY
2
Entering edit mode
8 weeks ago

a few of your reads are very long and those skew and alter the plots.

there is also no binning for the first 10 bases then it is binning into huge bins subsequently, again makes the plot misleading

do not remove the beginning of the reads for QC reasons, that is rarely an advisable course of action.

I would filter the reads to a more manageable length, say 15K and rerun the QC analysis

ADD COMMENT
0
Entering edit mode

I can remove reads longer than 15k. Do people also remove reads with a min-length (e.g. 500 nt)? How about adapter removal and/or quality filtering. Is it necessary? Or are the assembler good enough to work with raw data passed by Guppy?

ADD REPLY
0
Entering edit mode

The point that I was trying to make is that you have very few long reads, and when plotting those squishes the plots and make it harder to see what most of the reads look like.

Remove adapters (it might not even have that many) then move on with the analysis in my opinion. Nanopore reads is can be very unreliable individually, but methods can reconcile the overlapping reads.

ADD REPLY
0
Entering edit mode
8 weeks ago
Trivas ▴ 420

Your median read length is < 500 bases long and I'd guess that your mean read length is even shorter. So while your data are ok for ONT (maybe slightly below average quality), you are probably better off using short-read technology to get higher quality data with these fragment sizes.

ADD COMMENT

Login before adding your answer.

Traffic: 1801 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6