Tool: nanotimeparse - a bash solution for parsing oxford nanopore fastq on generation time
14 months ago by
raplayer10 wrote:

As the title suggests, nanotimeparse is a new bash tool (only external dependency is GNU parallel) that parses an Oxford Nanopore fastq file on read generation times.

You can git clone it here:

Get subsets of (-i) Oxford Nanopore Technologies (ONT) basecalled fastq reads in slices of (-s) minutes, over a period of (-p) minutes. Input fastq file is output as 2 sets of n fasta files (n = p/s). Set 1 is n fasta files, and each file contains reads generated from the start of the ONT run to each time slice. Set 2 is also n fasta files, but each file contains only newly generated reads between each time slice.

As for runtime, executing with 10 threads (@2.1GHz), and slicing on every hour over a 48 hour period, nanotimeparse takes about 30 minutes to parse 1.2M reads (~15GB) into both sets (and memory maxes out at (15GB/2)*10~=75GB, see 'Memory Considerations' in README). You can reduce memory consumption by reducing number of threads.

