Question: Pulse information of PacBio data
3.8 years ago
United States
mhasa006 wrote:

I have a set of PacBio long read data in *.bax.h5 format file. I want to detect DNA base modification (DNA methylation) using the pulse information of PacBio data. I know the pulse information are stored in pls.h5 file. But I don't have those file. Is it possible to extract some or any pulse information from *.bax.h5 file? Does the *.bax.h5 file contains any pulse information at all?

3.0 years ago
Charles Warden6.5k
Duarte, CA
Charles Warden wrote:

1) You can create an unaligned .bam file with the pulse information using bax2bam:

path/to/blasr/utils/bax2bam/bin/bax2bam -o outputPrefix /path/to/file.1.bax.h5 /path/to/file.2.bax.h5 /path/to/file.3.bax.h5

By default, IPD but not PulseWidth information is added. However, you can customize what features you want to add. For example:

path/to/blasr/utils/bax2bam/bin/bax2bam -o outputPrefix /path/to/file.1.bax.h5 /path/to/file.2.bax.h5 /path/to/file.3.bax.h5  --pulsefeatures=DeletionQV,DeletionTag,InsertionQV,IPD,PulseWidth,MergeQV,SubstitutionQV,SubstitutionTag --losslessframes

You can read more about the pacbio .bam file format here:

2) If you have an aligned cmp.h5 file (or a .sam alignment that you convert to a cmp.h5 file via samtoh5), you can use loadPulses to add base modification information:

loadPulses /path/to/file.bas.h5 /path/to/blasr.alignment.cmp.h5

bax2bam, loadPulses, and samtoh5 are part of the blasr package:

You can then use R-kinetics to parse work with the base modification information in the alignment:

My understanding is that pacbio may not continue to maintain the samtoh5 function (as they switch to using the .bam file format), but you can find it under /path/to/blasr/utils (if compiled). Same is true for loadPulses, unless you compile using pitchfork.

3.7 years ago
United States
ndunofficial wrote:

The bax.h5 files contain the width of the pulse (PW) as well as the space between pulses (IPD).  This should give you information about timing in between.

