Read block operation failed with BAM file
3
0
Entering edit mode
7 months ago

I am trying to automate extracting data from an indexed BAM file and I am getting read errors:

[E::bgzf_read_block] Invalid BGZF header at offset 51686517456
[E::bgzf_read] Read block operation failed with error 2 after 0 of 4 bytes

This failure happens after about 12 hours of analysis, near the end of the 52GB BAM file:

$ ll HUDEP.control.DS182418.chr8.bam*
-rw-r--r-- 1 areynolds stamlab 51901713847 Jun 15 12:46 HUDEP.control.DS182418.chr8.bam
-rw-r--r-- 1 areynolds stamlab     7250712 Jun 15 12:59 HUDEP.control.DS182418.chr8.bam.bai

I ran samtools quickcheck on it, which reports no issues:

$ samtools quickcheck -vvv HUDEP.control.DS182418.chr8.bam
verbosity set to 3
checking HUDEP.control.DS182418.chr8.bam
opened HUDEP.control.DS182418.chr8.bam
HUDEP.control.DS182418.chr8.bam is sequence data
HUDEP.control.DS182418.chr8.bam has 25 targets in header.
HUDEP.control.DS182418.chr8.bam has good EOF block.

I can otherwise read data from it with manual samtools and pysam commands to random locations within the BAM file.

$ python
Python 3.8.16 (default, Mar  2 2023, 03:21:46) 
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pysam
>>> print(pysam.__version__)
0.21.0

Remaking the BAM file might be an option but it is very time-consuming. I'd like to understand what went wrong, so I don't have to remake files again.

Is that the only option or are there other tests and fixes I could apply, before this?

bgzip samtools bam pysam htslib • 1.5k views
ADD COMMENT
0
Entering edit mode

https://github.com/samtools/htslib/pull/1676 patches a seek issue in htslib (upon which samtools and pysam depend)

ADD REPLY
0
Entering edit mode

Indeed, though it is a seek issue when accessing files over HTTP and other network protocols. So, unless there's something you didn't mention and this question is not in fact about accessing files locally, that is unlikely to be the answer to this question.

ADD REPLY
2
Entering edit mode
7 months ago

“Invalid BGZF header” means that the index has sent the BGZF reader to a location that is not the start of a BGZF block, so the index is incorrect or not in sync with the BAM file. Hence:

  • When was the index made? It's common to get this error when using an old .bai index file with a newly regenerated BAM file. Try nuking the .bai file and remaking it with samtools index.

  • How was the index made? There have previously been (and currently are, when multithreading) bugs with the on-the-fly index writing. So if you made the index via --write-index --threads …, try nuking the .bai file and remaking it with samtools index.

ADD COMMENT
0
Entering edit mode

For the latter, see e.g. HTSlib PR #1672.

ADD REPLY
0
Entering edit mode

Good to know, thanks. I am making the BAM file with samtools merge and the index with samtools index, immediately after the BAM file is generated. So far as I can tell, neither case applies.

ADD REPLY
0
Entering edit mode
7 months ago

other tests and fixes I

test the bam is correctly gzip-ped ?

gunzip -t in.bam && echo OK
ADD COMMENT
0
Entering edit mode

Both files test okay:

$ gunzip -t HUDEP.treatment.DS182417.chr8.bam && echo OK
OK
$ gunzip -t HUDEP.control.DS182418.chr8.bam && echo OK
OK

Weird.

ADD REPLY
0
Entering edit mode
7 months ago
size_t ▴ 120

this tool may be useful: bamrescue

ADD COMMENT

Login before adding your answer.

Traffic: 1521 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6