Question: Pacbio Quiver Consensus - How To Use?
1
gravatar for darxsys
5.8 years ago by
darxsys190
Croatia
darxsys190 wrote:

I'm working on a project in which I need to simulate mapping of short reads to long reads of a genome. I have come across this page: https://github.com/PacificBiosciences/GenomicConsensus/blob/master/doc/HowToQuiver.rst which offers a software for the consensus fase of the mapping. However, since I'm trying to do this for the first time, I don't know how to use this. I see that this program wants a cmp.h5 file as input, but how can I generate a file like that? What tools produce files like these? I know these files are a special format originating from PacBio, but how can I produce them?

For example, I have a whole E.Coli genome. I then sequence it using PBSim to produce very short (100 bp) and very long (10k bp) reads in fastq format. Now, I would like to map short ones to each long one and I need consensus software for that. Acutally, I don't even know which software to use for the first fase (before consensus), too (the one which would, I assume, give me as output cmp.h5 file needed by Quiver). Any help appreciated.

consensus • 9.5k views
ADD COMMENTlink modified 5.8 years ago by mchaisso160 • written 5.8 years ago by darxsys190

It would help if you explain what the purpose of the exercise is. Are you trying to error correct the long reads, such as done with pre-assembly in HGAP https://github.com/PacificBiosciences/Bioinformatics-Training/wiki/HGAP, or PacBioToCA http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=PacBioToCA pipelines?

ADD REPLYlink written 5.8 years ago by lexnederbragt1.2k
1
gravatar for Tky
5.8 years ago by
Tky990
Japan
Tky990 wrote:

Please have a check on Allora from SMRT Analysis tool

Allora, short for "a long read assembler," is PacBio's de novo assembly algorithm. Based on the open source assembly software package AMOS as well as other components tailored to PacBio’s long reads and error profile, Allora uses an overlap-layout-consensus approach to iteratively assemble raw reads into contigs and then outputs them as Fasta sequence and cmp.h5 files.

ADD COMMENTlink modified 5.8 years ago • written 5.8 years ago by Tky990

Thanks for help. I can't seem to find any link to download Allora or SMRT Analysis tool however.

ADD REPLYlink written 5.8 years ago by darxsys190

Take a look at download section of the following page http://pacbiodevnet.com/

ADD REPLYlink written 5.8 years ago by Tky990

In documentation of quiver they say that aligned reads should be on input in .cmp.h5 or .bam format... Aligned reads to what, I had some troubles to run pbalign. What do you think, would it be possible to use different mapper??

ADD REPLYlink written 3.0 years ago by kamiljaron120

Maybe it is too late to add a reply there, but I also had a lot of trouble understanding this, so if someone is still struggling, maybe he will find some hope in my answers !

Before using Quiver, you should produce a cmp.h5 file, which correspond to an alignment of your PacBio reads against a reference ( your genome assembly for example).

Here something you can try : pbalign --forQuiver your_movie.bas.h5 your_reference.fasta out.cmp.h5

I think that if you have several bas.h5 files, you can provide a fofn file ( which contains the path of all your different bas.h5 files)

I think that other mapper won't be able to read the specific bax.h5/bas.h5 PacBio format. I heard that they want to get rid of this strange format : https://github.com/PacificBiosciences/GenomicConsensus/blob/master/doc/HowTo.rst

In this link they say :

(...) This is inefficient and users attempting to do this have run into many problems with the instability of the HDF5 library (which PacBio is moving away from, in favor of BAM.)

Maybe one day, thing are going to be easier ! But for now you have to start by pbalgin, and then you can use Quiver !

ADD REPLYlink written 2.6 years ago by Roxane Boyer890
1
gravatar for mchaisso
5.8 years ago by
mchaisso160
United States
mchaisso160 wrote:

Some notes: Quiver is typically used at the end of an assembly, after overlaying the reads back on the assembly with an alignment. If you are looking into ways to do hybrid assembly consider PacBioToCA http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=PacBioToCA. If you are only doing bacterial assembly, I wouldn't pursue this too much as most prokaryote genomes assemble into a few or one contig without short read error correction: http://www.cbcb.umd.edu/software/PBcR/closure/report.log.krona.html .

Also, Quiver uses all of the quality values (InsertionQV, DeletionQV, SubstitutionQV, and MergeQV) stored in the bas.h5 files in order to have optimal consensus calling. PBSim only generates FASTQ. While I believe it is possible to use Quiver on this data, the results will be inferior to using the real data. There is a read simulator called "alchemy" that is tucked in with the blasr distribution on github (under the subdir 'simulator') that simulates all of these quality values, but it needs real data to train an error model on. Also, I've never tested the output of alchemy as input for Quiver, so I can't vouch that it works.

-mark

ADD COMMENTlink written 5.8 years ago by mchaisso160
0
gravatar for cts
5.8 years ago by
cts1.6k
Pasadena
cts1.6k wrote:

I don't us PacBio so this isn't going to be a complete answer for you, but from the file that you link to it seems that the cmp.h5 file is generated by the PacBio base calling software. However the file is in HDF5 format, which is open source. You could checkout this page for a specification of the cmp.h5 file format possibly make the file yourself.

ADD COMMENTlink written 5.8 years ago by cts1.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1613 users visited in the last hour