Question: Nanopore data and downstream anlysis
gravatar for MAPK
11 months ago by
United States
MAPK1.4k wrote:

I have a few microbiome data sequenced with Nanopore Minion. For each run, I have pass, fail and skip directories. Within the pass directory, I also have 0:10 (10 different) subdirectories. Would someone please explain me the difference between pass, fail and skip data and which data I should be analyzing. I also want to understand what 0 to 10 different subdirectories within pass directory mean? Thank you for your help.

nanopore • 1.5k views
ADD COMMENTlink modified 11 months ago by WouterDeCoster36k • written 11 months ago by MAPK1.4k
gravatar for WouterDeCoster
11 months ago by
WouterDeCoster36k wrote:

You did not explain how you base called the data, but I assume you used live base calling in MinKnow. I use albacore, and I would recommend you do the same for future runs. Live base calling creates problems, occasionally, and running albacore later on a server/cluster is usually beneficial.

The categories you asked about:

  • pass: reads have an average quality score > Q7
  • fail: reads have an average quality score < Q7
  • skip: I think these reads were not basecalled due to time constraints, you can still basecall those using albacore

Since I haven't used minknow for basecalling I'm not entirely sure on the skip part

Whether you want to combine the pass and fail reads is up to you and depends on your application. I tend to keep them both.

The subdirectories are made per 4000 reads (I believe that's the default) to avoid directories with far too many files (fast5 format).

ADD COMMENTlink written 11 months ago by WouterDeCoster36k

Thank you so much for your answer. Yes the base calling was done using live basecalling method. Now I have some more questions:

  1. Would you suggest to concatenate all the fasta reads extracted from fast5 files from all subdirectories?
  2. Do I need to separate 1D and 2D reads and what type of reads I should be using?
  3. What would be the downstream analysis I can perform and tools(beside poretools) I can use for these reads ?
  4. What are the circumstances you should be using fasta reads extracted from Fast5 vs. the fastq files from the run itself ?

Thank you again for your help.

ADD REPLYlink modified 11 months ago • written 11 months ago by MAPK1.4k
  1. You should be able to get a fastq file, and yes you can concatenate these. You could also choose to concatenate them per directory, and have multiple fastq files for parallel processing (depending on your needs)
  2. I have no idea which type of sequencing you have performed, but 2D has been deprecated for quite a while now. So this is old data?
  3. I don't know what you biological question is. I've written NanoPack: a set of scripts for visualizing and processing long read sequencing data, which might be useful for you (feedback welcome)
  4. I don't see a reason to use the fasta reads
ADD REPLYlink written 11 months ago by WouterDeCoster36k

Ok. Thank you. No this is new data. I also have two directories with both multiple fast5 and one fastq files for each microbiome sample. Should I use the fastq file generated by live basecalling method or should I convert all fast5 files to fastq?

ADD REPLYlink written 11 months ago by MAPK1.4k

I expect the fastq to contain all reads from that folder, you can easily count the files to verify that, although some fast5 may rarely fail basecalling and not lead to a read.

ADD REPLYlink written 11 months ago by WouterDeCoster36k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1902 users visited in the last hour