Question: What are the ".bax.h5" files generated by PacBio long reads sequencing ?
Hi everyone !

I'm new to PacBio long reads sequencing and I've read a lot about what exactly contains the raws files produced by this type of sequencer. I understand that the two different tpyes (.bas.h5 and .bax.h5) refer to what each file contains (sequence, quality value, information about the chemistry used...).

But as a beginner, I still don't understand how to transform theses .bax.h5 files into a subreads.fastq files, and also I don't know what exactly to give to a assembly pipeline (I'm using Falcon), should I give a fastq file or a .bax.h5 file ?

I've got the same problem for the polishing step with Quiver that require the quality informations contained in original files.

I really need some explanations about that, if you could please give me some advices !



These are HDF5 files : you can extract those files to fastq using (googling... ) see

Thanks for your answer ! I was looking for such a tool but didn't found it... Do you also know what file should be use for genome assembly ? The HDF5 files or the fastq file ?

Multiple options. Current recommendation seems to be canu (I think you have plenty of coverage if I remember other threads you have posted).

