Question: Pacbio: extract fastq from h5 file based on quality filtering
4.1 years ago by
United States
merodev140 wrote:

Hi, I am new to pacbio and have 2 sets of .h5 files as output from pacbio. I am planning to use celera assembler and for that i need fastq files from .h5 files.

1) Is there any way to convert .h5 to fastq.

2) Is there any specific method to filter pacbio reads based on quality?

3) Do we combine both sets of data and then work on it for assembly?


The quality values are sufficiently low that reads may be artificially trimmed by celera.  I've found it's best to just fake fastq from fasta with high enough quality value that reads are retained.  The assembly quality needs to be improved later using quiver.

4.1 years ago by
State College, PA, USA
Biomonika (Noolean)3.0k wrote:

1) and 2) Use --minLength 500 --readType subreads --minReadScore 0.8 --outType fastq 

Depends on your dataset, but if you just sequenced 2 SMRT cells to get more coverage, then you can merge them prior to assembly.

4.1 years ago by
thackl2.6k wrote:

Have a look at dextract. It's very quick and lets you set a score cutoff. However, I think it only generates FASTA.

dextract can generate FASTQ if you add -q paramenter.  To filter fastq with minimum Read Quality 0.80, use -s800 (default: 750)

dextract -q *.bax.h5 -s800 > raw_reads_RQ0.80.fastq



How to combine it with find command e.g. find All_RawData/Each_Cell_Raw/ -name "*.bax.h5" | xargs -I {} dextract -q {} > How to get the file name?

4.1 years ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche18k wrote:

I've written a perl wrapper for the hdf5 library that you might find useful. It hasn't been tested on pacbio files though I have no reason to think it wouldn't read them.

2.4 years ago by
mehmetgoktay19890 wrote:

Could you please tell me If also removes adapter sequences?

I have just used it and got subreads from raw data but I am not sure whether subreads still contains adapter sequences?

You need to post this as another question, also please refer to the manual

