Question: Pulse information of PacBio data
gravatar for mhasa006
4.6 years ago by
United States
mhasa00650 wrote:

I have a set of PacBio long read data in *.bax.h5 format file. I want to detect DNA base modification (DNA methylation) using the pulse information of PacBio data. I know the pulse information are stored in pls.h5 file. But I don't have those file. Is it possible to extract some or any pulse information from *.bax.h5 file? Does the *.bax.h5 file contains any pulse information at all?

pacbio methylation • 2.7k views
ADD COMMENTlink modified 3.7 years ago by Charles Warden7.5k • written 4.6 years ago by mhasa00650
gravatar for Charles Warden
3.7 years ago by
Charles Warden7.5k
Duarte, CA
Charles Warden7.5k wrote:

1) You can create an unaligned .bam file with the pulse information using bax2bam:

path/to/blasr/utils/bax2bam/bin/bax2bam -o outputPrefix /path/to/file.1.bax.h5 /path/to/file.2.bax.h5 /path/to/file.3.bax.h5

By default, IPD but not PulseWidth information is added. However, you can customize what features you want to add. For example:

path/to/blasr/utils/bax2bam/bin/bax2bam -o outputPrefix /path/to/file.1.bax.h5 /path/to/file.2.bax.h5 /path/to/file.3.bax.h5  --pulsefeatures=DeletionQV,DeletionTag,InsertionQV,IPD,PulseWidth,MergeQV,SubstitutionQV,SubstitutionTag --losslessframes

You can read more about the pacbio .bam file format here:

2) If you have an aligned cmp.h5 file (or a .sam alignment that you convert to a cmp.h5 file via samtoh5), you can use loadPulses to add base modification information:

loadPulses /path/to/file.bas.h5 /path/to/blasr.alignment.cmp.h5

bax2bam, loadPulses, and samtoh5 are part of the blasr package:

You can then use R-kinetics to parse work with the base modification information in the alignment:

My understanding is that pacbio may not continue to maintain the samtoh5 function (as they switch to using the .bam file format), but you can find it under /path/to/blasr/utils (if compiled). Same is true for loadPulses, unless you compile using pitchfork.

ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by Charles Warden7.5k
gravatar for ndunofficial
4.5 years ago by
United States
ndunofficial10 wrote:

The bax.h5 files contain the width of the pulse (PW) as well as the space between pulses (IPD).  This should give you information about timing in between.

ADD COMMENTlink written 4.5 years ago by ndunofficial10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1805 users visited in the last hour