Question

What are the ".bax.h5" files generated by PacBio long reads sequencing ?

1

Entering edit mode

7.8 years ago

Rox ★ 1.4k

Hi everyone !

I'm new to PacBio long reads sequencing and I've read a lot about what exactly contains the raws files produced by this type of sequencer. I understand that the two different tpyes (.bas.h5 and .bax.h5) refer to what each file contains (sequence, quality value, information about the chemistry used...).

But as a beginner, I still don't understand how to transform theses .bax.h5 files into a subreads.fastq files, and also I don't know what exactly to give to a assembly pipeline (I'm using Falcon), should I give a fastq file or a .bax.h5 file ?

I've got the same problem for the polishing step with Quiver that require the quality informations contained in original files.

I really need some explanations about that, if you could please give me some advices !

Cheers,

Roxane

Assembly sequencing • 5.7k views

ADD COMMENT • link updated 7.8 years ago by Pierre Lindenbaum 161k • written 7.8 years ago by Rox ★ 1.4k

score 1 · Answer 1 · 2016-07-04

1

Entering edit mode

7.8 years ago

Pierre Lindenbaum 161k

These are HDF5 files : https://en.wikipedia.org/wiki/Hierarchical_Data_Format you can extract those files to fastq using (googling... ) https://github.com/PacificBiosciences/pbh5tools/ see https://github.com/PacificBiosciences/pbh5tools/blob/master/doc/index.rst

ADD COMMENT • link 7.8 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Thanks for your answer ! I was looking for such a tool but didn't found it... Do you also know what file should be use for genome assembly ? The HDF5 files or the fastq file ?

ADD REPLY • link 7.8 years ago by Rox ★ 1.4k

0

Entering edit mode

Multiple options. Current recommendation seems to be canu (I think you have plenty of coverage if I remember other threads you have posted). https://github.com/PacificBiosciences/Bioinformatics-Training/wiki/Large-Genome-Assembly-with-PacBio-Long-Reads

ADD REPLY • link 7.8 years ago by GenoMax 141k