Question

Minimap2 segfaults when mapping Nanopore reads to a very large reference database

0

Entering edit mode

18 days ago

firefox91 ▴ 10

Hi everyone,

I'm running into a persistent issue with minimap2, and I’d really appreciate advice from people who have experience mapping Nanopore reads against very large reference collections.

I’m working with a custom database that contains about 5.56 million sequences, representing ~4.3 Gb of total sequence length. The index builds correctly with minimap2 (version 2.28 on an HPC system), and simple tests show that the index loads without problems — for example, mapping a single Nanopore read works as expected.

The problem appears as soon as I try to map a larger batch of reads. Even when I reduce the input to something small like 40,000 Nanopore reads, minimap2 crashes with a segmentation fault. This happens regardless of the number of threads (1, 2, 4, 8, or 16) and regardless of the amount of memory I request through SLURM (16 GB, 32 GB, 64 GB, or even 128 GB).

Here is an example of a failing command:

minimap2 -t 4 -ax map-ont -N 1 \
  DB_ALL_CONCAT.mmi \
  reads.fastq > alignments.sam

=> Segmentation fault (core dumped)

There is no SLURM or kernel OOM (out-of-memory) message — the failure comes directly from minimap2 itself. The segfault happens even with a single thread and without any advanced options.

Mapping against smaller databases (for example, genus-specific reference sets of a few thousand sequences) works perfectly and finishes within minutes. Only the combination of a very large database + tens of thousands of Nanopore reads triggers the crash.

Given that indexing works fine, that small subsets of reads work fine, and that the crash happens even with generous amounts of RAM allocated, I’m wondering whether this might be:

a memory fragmentation problem triggered by extremely large reference sets,
a limitation or bug specific to minimap2 2.28,
an internal buffer issue when using the map-ont preset,
or simply unavoidable memory pressure due to the size of the reference database.

Before going further, I would like to know whether other users have successfully mapped Nanopore reads against reference collections of this size (millions of sequences, several gigabases total), and whether there are recommended minimap2 settings for such extreme cases.

Would compiling an older version of minimap2 (such as 2.26) help? Unfortunately, my HPC cluster only provides version 2.28 via its module system. I could install another version locally, but before doing that I’d like to know whether this problem is known or expected with minimap2.

Any advice on whether this is a tuning issue, a known minimap2 limitation, or a version-specific bug would be extremely helpful. Thanks in advance for your time!

nanopore minimap2 minimap ont alignment • 275 views

ADD COMMENT • link 5 hours ago by firefox91 ▴ 10

score 2 · Answer 1 · 2025-11-14

I have encountered similar issues with minimap2 on large, fragmented references. Your setup with 5.56M sequences (~4.3 Gb) likely triggers integer overflow in the chaining step due to excessive anchors from similar sequences, as noted in GitHub issue #376 (from 2019, but relevant).

Segfaults without OOM suggest it's not purely memory—though your 128 GB trials are generous. Minimap2 loads the entire reference for alignment, exacerbating this with many short refs.

Try compiling the latest minimap2 (v2.28 is current as of 2025; grab from GitHub releases) locally, as HPC modules can lag or have custom builds. Use flags like -I 16g to increase index chunk size, or -k15 -w10 to reduce minimizer sensitivity and anchors.

As a workaround, cluster your database with CD-HIT (v4.8.1+) at 95% identity to cut redundancy: cd-hit -i DB_ALL_CONCAT.fa -o clustered.fa -c 0.95 -n 5 -M 0. Rebuild the index and map.

If that fails, consider Winnowmap2 (v2.03+) for repetitive refs: winnowmap -t4 -ax map-ont ref.mmi reads.fastq > alignments.sam. It handles large sets better.

Post your full error log on minimap2 GitHub for dev input—lh3 is responsive. This should resolve without downgrading versions.

Kevin