Is it possible to convert fastq format to PacBio bax.h5 file?
2
0
Entering edit mode
9.0 years ago
pengchy ▴ 450

Hi,

I want to align the fastq file onto genome using blasr, however, blasr only accept fasta or bax.h5 file format. fastq does not contain the quality information. So is it possible to convert the fastq file to bax.h5 file format?

Actually, I have sequenced a PacBio transcriptome. There is many raw subreads have not been utilized at clustering step of IsoSeq pipeline. So I want to align the remain subreads onto the genome to have a check the quality of alignment and alignment ratio. BLAT also can do this job. To compare with BLAT, I also want to test blasr. So If I want to do this work, I have two choice:

  1. convert fastq to bax.h5 format
  2. extract the remain subreads from the original bax.h5 file.

For choice 2, the bax.h5 file is big than fastq, and further how to extract subsection of this file into bax.h5 format?

Best,
Pengcheng

fastq PacBio • 5.8k views
ADD COMMENT
4
Entering edit mode
9.0 years ago
User 59 13k

What makes you think you can't use a fastq file with blasr? From the documentation here: https://github.com/mchaisso/blasr/blob/master/README.md

Typing blasr -h or blasr -help on the command line will give you a list of options. At the least, provide a fasta, fastq, or bas.h5 file, and a genome

And I can assure you that fastq files, correctly formatted, have quality information in, it is fasta files that will lack quality information.

ADD COMMENT
0
Entering edit mode

Hi Daniel,

Thank you for your reply and the information. I indeed checked the help information after I installed blasr. But not read the README.md file. Sorry, I will have a try.

blasr-master/alignment/bin/blasr -h     

   Options for blasr 
   Basic usage: 'blasr reads.{fasta,bax.h5} genome.fasta [-options] 
 option Description (default_value).
 Input Files.
   reads.fasta is a multi-fasta file of reads.  While any fasta file is valid input, 
               it is preferable to use plx.h5 or bax.h5 files because they contain
               more rich quality value information.
   reads.bax.h5|reads.plx.h5 Is the native output format in Hierarchical Data Format of 
               SMRT reads. This is the preferred input to blasr because rich quality
               value (insertion,deletion, and substitution quality values) information is 
               maintained.  The extra quality information improves variant detection and mapping
               speed.
ADD REPLY
3
Entering edit mode
9.0 years ago

The conversion of h5 to fastq is lossy, so no, it is not possible to recreate the original h5 file from a fastq file.

ADD COMMENT

Login before adding your answer.

Traffic: 1489 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6