Tutorial:Dealing with nanopore fast5 files compressed with vbz
0
15
Entering edit mode
2.3 years ago

I spent quite a bit of effort troubleshooting this so I thought of posting it for my own reference and for the community benefit, hopefully.

Starting with MinKNOW (I think), Nanopore sequencing started producing fast5 files compressed with their vbz custom algorithm instead of gzip.

vbz compression means that any software handling the raw signal from these fast5 files will fail with more or less cryptic errors. For example, I got:

h5dump FAQ95459_3d12db00_0.fast5
h5dump error: unable to print data

Or in python:

import h5py

fn = 'FAQ95459_3d12db00_0.fast5'

fin = h5py.File(fn, "r")
a_read = list(fin.keys())[0]
list(fin[a_read]['Raw']['Signal'])

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/export/home/db291g/.local/lib/python3.7/site-packages/h5py/_hl/dataset.py", line 664, in __iter__
    yield self[i]
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/export/home/db291g/.local/lib/python3.7/site-packages/h5py/_hl/dataset.py", line 710, in __getitem__
    return self._fast_reader.read(args)
  File "h5py/_selector.pyx", line 366, in h5py._selector.Reader.read
OSError: Can't read data (can't open directory: /usr/local/hdf5/lib/plugin)

Or using DNAscent

DNAscent/bin/DNAscent detect --bam aln.bam --reference ref/genome.fasta --index index.dnascent --output out.detect

Loading DNAscent index... ok.
Loading DNAscent index... ok.
Importing reference... ok.
Opening bam file... ok.
Importing reference... ok.
Opening bam file... ok.
Scanning bam file...ok.
HDF5-DIAG: Error detected in HDF5 (1.8.14) thread 140573004461824:n 0sec  failed:      0  
  #000: H5Dio.c line 173 in H5Dread(): can't read data
    major: Dataset
    minor: Read failed
  #001: H5Dio.c line 550 in H5D__read(): can't read data
    major: Dataset
    minor: Read failed
  #002: H5Dchunk.c line 1872 in H5D__chunk_read(): unable to read raw data chunk
    major: Low-level I/O
    minor: Read failed
  #003: H5Dchunk.c line 2902 in H5D__chunk_lock(): data pipeline read failed
    major: Data filters
    minor: Filter operation failed
  #004: H5Z.c line 1357 in H5Z_pipeline(): required filter 'vbz' is not registered
    major: Data filters
    minor: Read failed
  #005: H5PL.c line 298 in H5PL_load(): search in paths failed
    major: Plugin for dynamically loaded library
    minor: Can't get value
  #006: H5PL.c line 402 in H5PL__find(): can't open directory
    major: Plugin for dynamically loaded library
    minor: Can't open directory or file
DNAscent: src/event_handling.cpp:643: void normaliseEvents(read&, bool): Assertion `et.n > 0' failed.
Scanning bam file...ok.
HDF5-DIAG: Error detected in HDF5 (1.8.14) thread 140141318317824:n 0sec  failed:      0  
  #000: H5Dio.c line 173 in H5Dread(): can't read data
    major: Dataset
    minor: Read failed
  #001: H5Dio.c line 550 in H5D__read(): can't read data
    major: Dataset
    minor: Read failed
  #002: H5Dchunk.c line 1872 in H5D__chunk_read(): unable to read raw data chunk
    major: Low-level I/O
    minor: Read failed
  #003: H5Dchunk.c line 2902 in H5D__chunk_lock(): data pipeline read failed
    major: Data filters
    minor: Filter operation failed
  #004: H5Z.c line 1357 in H5Z_pipeline(): required filter 'vbz' is not registered
    major: Data filters
    minor: Read failed
  #005: H5PL.c line 298 in H5PL_load(): search in paths failed
    major: Plugin for dynamically loaded library
    minor: Can't get value
  #006: H5PL.c line 402 in H5PL__find(): can't open directory
    major: Plugin for dynamically loaded library
    minor: Can't open directory or file
DNAscent: src/event_handling.cpp:643: void normaliseEvents(read&, bool): Assertion `et.n > 0' failed.

Solution:

You need to install Nanopore's vbz plugin to handle vbz compression. Thankfully, it's available on bioconda:

mamba install ont_vbz_hdf_plugin

NB The first time you install it, you need to deactivate and re-activate the environment for the variable HDF5_PLUGIN_PATH to be exported. Alternatively, export the variable yourself:

export HDF5_PLUGIN_PATH="${CONDA_PREFIX}/hdf5/lib/plugin/"

You can also convert vbz to gzip with ont_fast5_api, also available on bioconda.


I don't know if it was me, but all this wasn't obvious to me at all!

hdf5 nanopore fast5 vbz MinKNOW • 4.0k views
ADD COMMENT
0
Entering edit mode

Thanks you, I have looked long to understand how to run tailfindr, which also uses this VBZ plug-in, but couldn't get it done. Now I hope to make it works.

ADD REPLY

Login before adding your answer.

Traffic: 2512 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6