Question: Pacbio: extract fastq from h5 file based on quality filtering
gravatar for merodev
5.0 years ago by
United States
merodev140 wrote:

Hi, I am new to pacbio and have 2 sets of .h5 files as output from pacbio. I am planning to use celera assembler and for that i need fastq files from .h5 files.

1) Is there any way to convert .h5 to fastq.

2) Is there any specific method to filter pacbio reads based on quality?

3) Do we combine both sets of data and then work on it for assembly?


ADD COMMENTlink modified 3.3 years ago by mehmetgoktay19890 • written 5.0 years ago by merodev140

The quality values are sufficiently low that reads may be artificially trimmed by celera.  I've found it's best to just fake fastq from fasta with high enough quality value that reads are retained.  The assembly quality needs to be improved later using quiver.

ADD REPLYlink written 5.0 years ago by mchaisso160
gravatar for Biomonika (Noolean)
5.0 years ago by
State College, PA, USA
Biomonika (Noolean)3.1k wrote:

1) and 2) Use --minLength 500 --readType subreads --minReadScore 0.8 --outType fastq 

Depends on your dataset, but if you just sequenced 2 SMRT cells to get more coverage, then you can merge them prior to assembly.

ADD COMMENTlink modified 5.0 years ago • written 5.0 years ago by Biomonika (Noolean)3.1k
gravatar for thackl
5.0 years ago by
thackl2.8k wrote:

Have a look at dextract. It's very quick and lets you set a score cutoff. However, I think it only generates FASTA.

ADD COMMENTlink written 5.0 years ago by thackl2.8k

dextract can generate FASTQ if you add -q paramenter.  To filter fastq with minimum Read Quality 0.80, use -s800 (default: 750)

dextract -q *.bax.h5 -s800 > raw_reads_RQ0.80.fastq



ADD REPLYlink modified 4.9 years ago • written 4.9 years ago by rtliu2.1k

How to combine it with find command e.g. find All_RawData/Each_Cell_Raw/ -name "*.bax.h5" | xargs -I {} dextract -q {} > How to get the file name?

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by Ric290

ADD REPLYlink written 2.6 years ago by h.mon29k
gravatar for Jean-Karim Heriche
5.0 years ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche21k wrote:

I've written a perl wrapper for the hdf5 library that you might find useful. It hasn't been tested on pacbio files though I have no reason to think it wouldn't read them.

ADD COMMENTlink written 5.0 years ago by Jean-Karim Heriche21k
gravatar for mehmetgoktay1989
3.3 years ago by
mehmetgoktay19890 wrote:

Could you please tell me If also removes adapter sequences?

I have just used it and got subreads from raw data but I am not sure whether subreads still contains adapter sequences?

ADD COMMENTlink written 3.3 years ago by mehmetgoktay19890

You need to post this as another question, also please refer to the manual

ADD REPLYlink written 3.3 years ago by Rohit1.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1615 users visited in the last hour