Question

QC of fast5 files

0

Entering edit mode

6.9 years ago

#### ▴ 220

I have fast5 files from Minion data. I want to perform the quality check for these files are there any tools available for Quality check of fast5 files?

I tried an alternative I extracted fastq from fast5 and then tried to do fastqc,but results are not satisfactory quality scores are very low , I am not sure if this can be right approach to assess Minion data.

Minion fast5 fastq • 8.2k views

ADD COMMENT • link 6.9 years ago by #### ▴ 220

score 3 · Accepted Answer · 2017-06-06

3

Entering edit mode

6.9 years ago

WouterDeCoster 47k

I want to perform the quality check for these files ware there any tools available for QC of fast5 files?

It's not entirely clear what you want to do. What kind of QC would you like to investigate?
Anyway, I would like to promote a script I wrote, NanoPlot. It's meant for plots of reads (fastq) and alignments (bam) of Oxford Nanopore sequencing data. I'm looking forward to your feedback. A few examples can be found on my blog. But I recently decided to remove fast5 support, because the latest basecaller directly outputs fastq.

I've also done something similar to two plots of fastQC: Per base sequence content and quality.

Let me know if there's something else I can help you with.

ADD COMMENT • link 6.9 years ago by WouterDeCoster 47k

0

Entering edit mode

Seems your Nanoplot shall do the needful. I am looking for something similar you have performed here : https://gigabaseorgigabyte.wordpress.com/2017/06/01/example-gallery-of-nanoplot/ , the average read quality.

As mentioned by you, you have performed : Per base sequence content and quality what are these value in the y-axis , if these values are Phred scores then Q30 is usually considered to be as cutoff for good quality read. In your plots all the bases shows quality value of 8, I want to know ,what threshold are you using as cut-off for good quality reads.

ADD REPLY • link 6.9 years ago by #### ▴ 220

2

Entering edit mode

Please let me know how NanoPlot works for you and which issues you may encounter.

The Per base sequence quality has improved quite a bit with the release of the latest albacore basecaller, for that see also this post.

About filtering reads, have a look at this post. I also wrote NanoFilt for filtering and trimming. I don't know what your application is, but using a cut-off of average basecall quality > 12 removes the worst quality reads. If you put the cut-off at 16 or 17 you are always getting better reads, but also losing quite a bit of reads. Have a look at the plot below to decide:

enter image description here

ADD REPLY • link 6.9 years ago by WouterDeCoster 47k

0

Entering edit mode

Yes I'll let you know how it looks for my data.Its for RNA-Seq data , any particular consideration for RNA-Seq data ?

ADD REPLY • link 6.9 years ago by #### ▴ 220

0

Entering edit mode

How did you prepare the library?

You would expect shorter fragments in RNA-seq, but that shouldn't be a real problem. I haven't tested it on RNA-seq, so this is an interesting test. I suggest you use GMAP for the alignment. Perhaps the percent identities you obtain will be lower since spliced alignment might introduce some mismapping.

ADD REPLY • link 6.9 years ago by WouterDeCoster 47k

0

Entering edit mode

I prefer STAR over GMAP as STAR gives more specific alignment and False positive rate is less using STAR then GMAP. and STAR has provided promising results on PacBio Long reads.

ADD REPLY • link 6.9 years ago by #### ▴ 220

0

Entering edit mode

Well, I've tried STAR for Oxford Nanopore sequencing, and my experience is different. I assume PacBio CCS reads?

ADD REPLY • link 6.9 years ago by WouterDeCoster 47k