I started working on my first project with PacBio sequencing data, and after 2 days filled with fruitless googling, missing libraries and failed c compilations, I decided it's time to ask for help.
The information on tools and pipelines for PacBio data is scattered everywhere and most of it seems horribly out of date. I was hoping some of you could help me to get started and other lost souls in the future will hopefully find this post and save some time and frustration.
I have a set of Primary Analysis Data available, as explained in the first paragraph here.
After some struggle, I managed to compile the latest version of blasr and pbalign on Ubuntu 16.04 using pitchfork. I also managed to install the R packages h5r, pbh5, and seqPatch. If anyone is reading this and has trouble with R and HDF5 libraries under Ubuntu, see my question here.
What I don't have:
I don't have access to a server with the SMRT Link platform.
What I would like to do:
I would like to access the interpulse duration (IPD), as explained in this white paper, and preferably access this data in R. I am open to suggestions for other tools/programming languages as well though.
I need cmp.h5 files to load the IPDs from the R packages. How do I generate those? When I try to run pbalign, it says
pbalign no longer supports CMP.H5 Output in 3.0. Is there any other way to get to the IPDs without going through cmp.h5 files?
Thank you very much for your time.
(Edit note: I realized my questions were too broad and specified a single problem to start with instead.)