I have a few microbiome data sequenced with Nanopore Minion. For each run, I have pass, fail and skip directories. Within the pass directory, I also have 0:10 (10 different) subdirectories. Would someone please explain me the difference between pass, fail and skip data and which data I should be analyzing. I also want to understand what 0 to 10 different subdirectories within pass directory mean? Thank you for your help.
You did not explain how you base called the data, but I assume you used live base calling in MinKnow. I use albacore, and I would recommend you do the same for future runs. Live base calling creates problems, occasionally, and running albacore later on a server/cluster is usually beneficial.
The categories you asked about:
- pass: reads have an average quality score > Q7
- fail: reads have an average quality score < Q7
- skip: I think these reads were not basecalled due to time constraints, you can still basecall those using albacore
Since I haven't used minknow for basecalling I'm not entirely sure on the skip part
Whether you want to combine the pass and fail reads is up to you and depends on your application. I tend to keep them both.
The subdirectories are made per 4000 reads (I believe that's the default) to avoid directories with far too many files (fast5 format).