Hi everyone,
I'm running into a persistent issue with minimap2, and I’d really appreciate advice from people who have experience mapping Nanopore reads against very large reference collections.
I’m working with a custom database that contains about 5.56 million sequences, representing ~4.3 Gb of total sequence length. The index builds correctly with minimap2 (version 2.28 on an HPC system), and simple tests show that the index loads without problems — for example, mapping a single Nanopore read works as expected.
The problem appears as soon as I try to map a larger batch of reads. Even when I reduce the input to something small like 40,000 Nanopore reads, minimap2 crashes with a segmentation fault. This happens regardless of the number of threads (1, 2, 4, 8, or 16) and regardless of the amount of memory I request through SLURM (16 GB, 32 GB, 64 GB, or even 128 GB).
Here is an example of a failing command:
minimap2 -t 4 -ax map-ont -N 1 \
DB_ALL_CONCAT.mmi \
reads.fastq > alignments.sam
=> Segmentation fault (core dumped)
There is no SLURM or kernel OOM (out-of-memory) message — the failure comes directly from minimap2 itself. The segfault happens even with a single thread and without any advanced options.
Mapping against smaller databases (for example, genus-specific reference sets of a few thousand sequences) works perfectly and finishes within minutes. Only the combination of a very large database + tens of thousands of Nanopore reads triggers the crash.
Given that indexing works fine, that small subsets of reads work fine, and that the crash happens even with generous amounts of RAM allocated, I’m wondering whether this might be:
a memory fragmentation problem triggered by extremely large reference sets,
a limitation or bug specific to minimap2 2.28,
an internal buffer issue when using the map-ont preset,
or simply unavoidable memory pressure due to the size of the reference database.
Before going further, I would like to know whether other users have successfully mapped Nanopore reads against reference collections of this size (millions of sequences, several gigabases total), and whether there are recommended minimap2 settings for such extreme cases.
Would compiling an older version of minimap2 (such as 2.26) help? Unfortunately, my HPC cluster only provides version 2.28 via its module system. I could install another version locally, but before doing that I’d like to know whether this problem is known or expected with minimap2.
Any advice on whether this is a tuning issue, a known minimap2 limitation, or a version-specific bug would be extremely helpful. Thanks in advance for your time!
Many thanks !