samtools index: Numerical result out of range for both .bai and .csi
1
0
Entering edit mode
8 days ago
tnminh89 • 0

Hi,

I am trying to create the index (I tried both .bai and .csi) for a .bam file using samtools

samtools index -c realigned.bam


But I kept having this error:

[E::hts_idx_push] Region 1076826223..1076826224 cannot be stored in a csi index with min_shift = 15, n_lvls = 5. Try using min_shift = 14, n_lvls >= 6
samtools index: failed to create index for "realigned.bam": Numerical result out of range


I also tried changing the min_shift from 14 to 15 but the error was still there.

samtools index -c -m 15 realigned.bam


Can someone tell me what I did wrong? I would really appreciate the help!

csi index samtools bai • 221 views
0
Entering edit mode

You should probably show your code for making the realigned bam, that seems to be the cause of the problem.

1
Entering edit mode
8 days ago
d-cameron ★ 2.8k

Can someone tell me what I did wrong? I would really appreciate the help!

There's a few possible root causes for this:

• Long contigs. Really really long contigs (>MAX_INT) are not support by bam at all (since it uses a 32-bit integer for position). Long (>512Mb) contigs are not supported by .bai. Solution: use CRAM.

• Alignments outside of reference. The .csi index format looks at the header contigs lengths to work out the index structure. If you've got alignments that align well over the end of a chromosome (position 1,076,826,223 is well outside of human/mouse) then then index will have inferred incorrectly. Solution: fix your nonsensical alignments.

• You ran samtools index -m 15 (hence the min_shift error). The default has always been 14. Solution: let samtools infer the index structure.

• You're running a very old version of samtools. Solution: upgrade

0
Entering edit mode

Sadly CRAM also has the 2GB size limit for chromosomes. It's something fixed by CRAM 4.0, but that needs a second specification maintainer to proceed.

Note for things that are REALLY long, don't discount SAM! When compressed with bgzf it's not really any different in size to BAM (possibly smaller), and with htslib it's not that much slower. Infact with multi-threading it can sometimes outperform BAM even. Obviously it'd also need CSI indices.

SAM has no limitation on length as it's textual. (Well it's 2^63, but that's more than enough.)

As to the actual problem above - it's possible this may have been one of the things we fixed in CSI. Please make sure you're using the latest samtools/htslib first before reporting issues.