Question: PacBio interpulse duration (IPD) data
3
gravatar for bgbrink
13 months ago by
bgbrink30
bgbrink30 wrote:

I started working on my first project with PacBio sequencing data, and after 2 days filled with fruitless googling, missing libraries and failed c compilations, I decided it's time to ask for help.

The information on tools and pipelines for PacBio data is scattered everywhere and most of it seems horribly out of date. I was hoping some of you could help me to get started and other lost souls in the future will hopefully find this post and save some time and frustration.

The data:

I have a set of Primary Analysis Data available, as explained in the first paragraph here.

The tools:

After some struggle, I managed to compile the latest version of blasr and pbalign on Ubuntu 16.04 using pitchfork. I also managed to install the R packages h5r, pbh5, and seqPatch. If anyone is reading this and has trouble with R and HDF5 libraries under Ubuntu, see my question here.

What I don't have:

I don't have access to a server with the SMRT Link platform.

What I would like to do:

I would like to access the interpulse duration (IPD), as explained in this white paper, and preferably access this data in R. I am open to suggestions for other tools/programming languages as well though.

Problem:

I need cmp.h5 files to load the IPDs from the R packages. How do I generate those? When I try to run pbalign, it says pbalign no longer supports CMP.H5 Output in 3.0. Is there any other way to get to the IPDs without going through cmp.h5 files?

Thank you very much for your time.

(Edit note: I realized my questions were too broad and specified a single problem to start with instead.)

ADD COMMENTlink modified 3 months ago by lr653580 • written 13 months ago by bgbrink30
2

PacBio has a wiki dedicated to training for PacBio data. In case you have not discovered it.

Here is technical info about h5 format PacBio uses and the tools they provide.

ADD REPLYlink modified 13 months ago • written 13 months ago by genomax65k

I did see the training, but it was not very helpful, since most of it is tailored to PacBio's SMRT Portal/SMRT Link platform (what's the difference anyway?). I will have another look though, since I also missed the python script you mentioned. Thanks a lot for pointing that out!

ADD REPLYlink written 13 months ago by bgbrink30
3
gravatar for tjduncan
13 months ago by
tjduncan230
Indianapolis, IN
tjduncan230 wrote:
  • What organism is your data from?
  • What is the genome size of the organism? What Coverage do you have?
  • What PacBio instrument was your data generated on RSII (output is a bax.h5 file) or Sequel (output is an unaligned .bam file)?

The instrument that the data was generated on is important as it will determine the BFX tools you can use for analysis.

  • SMRT Portal is designed to be used with RSII data and accept raw bax.h5 files for input. It was last updated in Nov 2014 so unfortunately not actively maintained. You will be reserved to scrolling through outdated GitHub info and the interwebs for analysis help.

  • SMRT Link is designed to be used with Sequel data and accept an unaligned .bam file for input. It is somewhat backwards compatible with RSII data. There is a bax2bam command that allows you to convert RSII data to the same unaligned bam format of Sequel data. This is sufficient for most applications but I don't think it works (correct me if I am wrong?) for base mod work because the IDP info is not conserved upon file conversion.

I am not aware of a way you can get around using a cmp.h5 file for IDP information. It is also likely it will be hard to get out of downloading and using either SMRT Portal of SMRT Link in one way or another. Luckily both of them can be used relatively easily on a workstation (dependent on genome size of your organism). SMRT Portal can be run in GUI format on a workstation and SMRT Link can be run in command-line only format (without having to set up a full SMRT Link server).

SMRT Portal

SMRT Link

  • Here is the download for SMRT Link
  • Here is official instructions for downloading SMRT Link see page 8 for the command line only tools.
  • Here is a Biostars link to instructions on how to install the command-line only tools It is wayyyy better than the official instructions.
  • Here is the SMRT Tools Reference Guide - It is an in-depth list of all the commands possible in smrt link and their options. Check out pages 41 -43 for MotifMaker.

  • Check out pages 52-58 of the reference guide for pbsmrtpipe, this will allow you to run the whole ds_motif_modification_analysis pipeline that allows you to generate the .csv file seen in the whitepaper. Specifically page 58 shows the command that would run this pipeline (with a slight ID modification).

Once you have ran a base mod pipeline in either SMRT Portal or SMRT Link (via pbsmrtpipe) you should have output .csv, .gff. and cmp.h5 files that you can do tertiary analysis on using whatever you want. There are also a few tools available from PacBio that run downstream of the initial analysis.

  • PacBio Base Mod tools this is the link to their GitHub of additional tools. It looks like only kineticsTools, MotifMaker, and MotifFinder have been updated for Sequel data.

There are also a handful of methods developed by other researchers to use IDP / base mod data from PacBio data. You could look at some of the published papers to get additional ideas.

ADD COMMENTlink written 13 months ago by tjduncan230

Thanks, this was really helpful. As I mentioned in my original post, I have RSII data available. Thus, I could not use SMRT Link. However, I was able to run the old SMRT analysis on my laptop.

ADD REPLYlink written 13 months ago by bgbrink30

I have the Pacbio data generated by Sequel (.bam) of a plant genome. I want to analysis the methylome of this plant using AgIn. However, I noticed that the modification.csv file is required in AgIn. I do not have this file. Could you @tjduncan tell me how to generate this file ?

ADD REPLYlink written 12 months ago by yangxiaofeihe0
0
gravatar for lr65358
3 months ago by
lr653580
lr653580 wrote:

You need an older version of pbalign.

conda install -c bioconda blasr

git clone https://github.com/PacificBiosciences/pbalign.git

cd pbalign

git checkout 6c8618cfee963e2167100cb0b293aedf85f32dcf

sudo pip install .

ADD COMMENTlink modified 3 months ago • written 3 months ago by lr653580
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2392 users visited in the last hour