Question: QC of fast5 files
0
gravatar for ####
21 months ago by
####180
####180 wrote:

I have fast5 files from Minion data. I want to perform the quality check for these files are there any tools available for Quality check of fast5 files?

I tried an alternative I extracted fastq from fast5 and then tried to do fastqc,but results are not satisfactory quality scores are very low , I am not sure if this can be right approach to assess Minion data.

fast5 minion fastq • 2.4k views
ADD COMMENTlink modified 21 months ago • written 21 months ago by ####180
1
gravatar for WouterDeCoster
21 months ago by
Belgium
WouterDeCoster37k wrote:

I want to perform the quality check for these files ware there any tools available for QC of fast5 files?

It's not entirely clear what you want to do. What kind of QC would you like to investigate?
Anyway, I would like to promote a script I wrote, NanoPlot. It's meant for plots of reads (fastq) and alignments (bam) of Oxford Nanopore sequencing data. I'm looking forward to your feedback. A few examples can be found on my blog. But I recently decided to remove fast5 support, because the latest basecaller directly outputs fastq.

I've also done something similar to two plots of fastQC: Per base sequence content and quality.

Let me know if there's something else I can help you with.

ADD COMMENTlink written 21 months ago by WouterDeCoster37k

Seems your Nanoplot shall do the needful. I am looking for something similar you have performed here : https://gigabaseorgigabyte.wordpress.com/2017/06/01/example-gallery-of-nanoplot/ , the average read quality.

As mentioned by you, you have performed : Per base sequence content and quality what are these value in the y-axis , if these values are Phred scores then Q30 is usually considered to be as cutoff for good quality read. In your plots all the bases shows quality value of 8, I want to know ,what threshold are you using as cut-off for good quality reads.

ADD REPLYlink modified 21 months ago • written 21 months ago by ####180

Please let me know how NanoPlot works for you and which issues you may encounter.

The Per base sequence quality has improved quite a bit with the release of the latest albacore basecaller, for that see also this post.

About filtering reads, have a look at this post. I also wrote NanoFilt for filtering and trimming. I don't know what your application is, but using a cut-off of average basecall quality > 12 removes the worst quality reads. If you put the cut-off at 16 or 17 you are always getting better reads, but also losing quite a bit of reads. Have a look at the plot below to decide:

enter image description here

ADD REPLYlink modified 21 months ago • written 21 months ago by WouterDeCoster37k

Yes I'll let you know how it looks for my data.Its for RNA-Seq data , any particular consideration for RNA-Seq data ?

ADD REPLYlink written 21 months ago by ####180

How did you prepare the library?

You would expect shorter fragments in RNA-seq, but that shouldn't be a real problem. I haven't tested it on RNA-seq, so this is an interesting test. I suggest you use GMAP for the alignment. Perhaps the percent identities you obtain will be lower since spliced alignment might introduce some mismapping.

ADD REPLYlink written 21 months ago by WouterDeCoster37k

I prefer STAR over GMAP as STAR gives more specific alignment and False positive rate is less using STAR then GMAP. and STAR has provided promising results on PacBio Long reads.

ADD REPLYlink written 21 months ago by ####180

Well, I've tried STAR for Oxford Nanopore sequencing, and my experience is different. I assume PacBio CCS reads?

ADD REPLYlink written 21 months ago by WouterDeCoster37k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1264 users visited in the last hour