Hi everyone !
I'm new to PacBio long reads sequencing and I've read a lot about what exactly contains the raws files produced by this type of sequencer. I understand that the two different tpyes (.bas.h5 and .bax.h5) refer to what each file contains (sequence, quality value, information about the chemistry used...).
But as a beginner, I still don't understand how to transform theses .bax.h5 files into a subreads.fastq files, and also I don't know what exactly to give to a assembly pipeline (I'm using Falcon), should I give a fastq file or a .bax.h5 file ?
I've got the same problem for the polishing step with Quiver that require the quality informations contained in original files.
I really need some explanations about that, if you could please give me some advices !
Cheers,
Roxane
Thanks for your answer ! I was looking for such a tool but didn't found it... Do you also know what file should be use for genome assembly ? The HDF5 files or the fastq file ?
Multiple options. Current recommendation seems to be canu (I think you have plenty of coverage if I remember other threads you have posted). https://github.com/PacificBiosciences/Bioinformatics-Training/wiki/Large-Genome-Assembly-with-PacBio-Long-Reads