Pulse information of PacBio data
2
0
Entering edit mode
8.9 years ago
mhasa006 ▴ 70

I have a set of PacBio long read data in .bax.h5 format file. I want to detect DNA base modification (DNA methylation) using the pulse information of PacBio data. I know the pulse information are stored in pls.h5 file. But I don't have those file. Is it possible to extract some or any pulse information from .bax.h5 file? Does the *.bax.h5 file contains any pulse information at all?

Methylation PacBio • 4.1k views
ADD COMMENT
1
Entering edit mode
8.0 years ago

1) You can create an unaligned .bam file with the pulse information using bax2bam:

path/to/blasr/utils/bax2bam/bin/bax2bam -o outputPrefix /path/to/file.1.bax.h5 /path/to/file.2.bax.h5 /path/to/file.3.bax.h5

By default, IPD but not PulseWidth information is added. However, you can customize what features you want to add. For example:

path/to/blasr/utils/bax2bam/bin/bax2bam -o outputPrefix /path/to/file.1.bax.h5 /path/to/file.2.bax.h5 /path/to/file.3.bax.h5  --pulsefeatures=DeletionQV,DeletionTag,InsertionQV,IPD,PulseWidth,MergeQV,SubstitutionQV,SubstitutionTag --losslessframes

You can read more about the pacbio .bam file format here: http://pacbiofileformats.readthedocs.io/en/3.0/BAM.html

2) If you have an aligned cmp.h5 file (or a .sam alignment that you convert to a cmp.h5 file via samtoh5), you can use loadPulses to add base modification information:

loadPulses /path/to/file.bas.h5 /path/to/blasr.alignment.cmp.h5

bax2bam, loadPulses, and samtoh5 are part of the blasr package:

https://github.com/PacificBiosciences/blasr

You can then use R-kinetics to parse work with the base modification information in the alignment:

https://github.com/PacificBiosciences/R-kinetics

My understanding is that pacbio may not continue to maintain the samtoh5 function (as they switch to using the .bam file format), but you can find it under /path/to/blasr/utils (if compiled). Same is true for loadPulses, unless you compile using pitchfork.

ADD COMMENT
0
Entering edit mode
8.8 years ago
ndunofficial ▴ 10

The bax.h5 files contain the width of the pulse (PW) as well as the space between pulses (IPD). This should give you information about timing in between.

ADD COMMENT

Login before adding your answer.

Traffic: 2482 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6