Question: Fishing out specific sequences from large PacBio bax.h5 files
1
gravatar for roblogan6
2.6 years ago by
roblogan630
roblogan630 wrote:

I have 125,000 individual reads from PacBio in fasta format, processed from bax.h5 files. I have clustered these reads based on unique molecular identifiers. I would now like to align these individual reads per cluster to a reference genome using the PacBio SMRT portal module blasr.

I am interested in using the bax.h5 information rather than simply the fasta files for the alignment. Is there anyway that I can use the fasta headers to make a whitelist to call the read information from the large bax.h5 files to fish out the associated information?

When I use ConsensusTools to generate a Long Amplicon Analysis for example, there are command line options for using a "file of file names" to then go and get the information from a whitelist. There are no such options for blasr, but I wonder if there is a way to do it before hand? How can I use only a small, defined subset of reads from the large bax.h5 files for blasr? Thanks for any help or suggestions.

pacbio blasr next-gen alignment • 980 views
ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by roblogan630
0
gravatar for roblogan6
2.6 years ago by
roblogan630
roblogan630 wrote:

I had sent an email to PacBio technical support about this and got the following response, for those who might be having the same problem:

My name is Roberto Lleras, Bioinformatics FAS Manager at PacBio. I'd be happy to answer your question. In order to manually look through the bax.h5 files in order to select specific reads to use in BLASR, you'd need to utilize the pbcore.io python library and write custom scripts to create new H5 files that only contained your filtered reads. Information on the functions of pbcore.io can be found here: http://pacificbiosciences.github.io/pbcore/pbcore.io.html#bas-h5-bax-h5-formats-pacbio-basecalls-file

Alternatively, you can align everything and then filter poor alignments with the cmph5tools.py software included with SMRTAnalysis. Information on filtering datasets with cmph5tools.py can be found here: https://github.com/PacificBiosciences/pbh5tools/blob/master/doc/cmph5tools-examples.rst

ADD COMMENTlink written 2.6 years ago by roblogan630
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 912 users visited in the last hour