Question

FASTQC and PacBio reads

1

Entering edit mode

8.6 years ago

tptacek3050 ▴ 70

I'm working with a group that did some PacBio sequencing to aid in the assembly of some bacterial genomes.

The first priority for this group is to assess the quality of the read data. One of the formats that the read data were returned in was of fastq. We already have a pipeline for fastqc, so I first tried running the read files through our fastqc pipeline. However, fastqc keeps crashing due to java out of memory/heap space error. I have two questions:

One, how do I increase the java memory allocation for fastqc?

Two, even though the PacBio reads are in the form of fastq files, should I even be using fastqc? Is there a better program? This is my first time handling PacBio data, so I'm sorry if these are very basic questions.

fastqc memory java PacBio • 8.7k views

ADD COMMENT • link updated 20 months ago by Ram 43k • written 8.6 years ago by tptacek3050 ▴ 70

1

Entering edit mode

To increase the java memory allocation, add this line to your job script before calling fastqc: export _JAVA_OPTIONS=-Xmx<memory allocation>

ADD REPLY • link updated 20 months ago by Ram 43k • written 5.0 years ago by JimmyB ▴ 10

0

Entering edit mode

-Xmx is the option to increase the memory available to JVM (i.e java programs)

ADD REPLY • link updated 20 months ago by Ram 43k • written 8.6 years ago by cpad0112 21k

Ram · Answer 1 · 2015-10-07

2

Entering edit mode

8.6 years ago

Brian Bushnell 20k

I don't think fastqc is appropriate for PacBio data. None of the charts would be useful, in my opinion, even if it did work.

ADD COMMENT • link updated 20 months ago by Ram 43k • written 8.6 years ago by Brian Bushnell 20k

0

Entering edit mode

Hi Brian,

So, would you please recommend some pipeline/package to assess the quality of the PacBio reads? I have also started with FastQC and indeed it didn't provide much useful information.

Thanks,
Ting

ADD REPLY • link updated 20 months ago by Ram 43k • written 8.5 years ago by purplemoonyt • 0

0

Entering edit mode

The SMRT Pipe utilities should be of use for this, though I don't use them so I'm not sure which ones are appropriate. But I recommend you look there first. If you want an empirical analysis of the error rates, I suggest you map the filtered subreads against a reference or assembly using the BBMap package. For example:

mapPacBio.sh in=reads.fastq ref=reference.fasta maxlen=2000bp minlen=100bp mhist=mhist.txt idhist=idhist.txt indelhist=indelhist.txt qhist=qhist.txt qahist=qahist.txt bhist=bhist.txt covhist=covhist.txt

In addition to the histograms, the stderr output from the process will give a summary of the overall read quality in insertion, deletion, substitution, and match rates.

ADD REPLY • link updated 20 months ago by Ram 43k • written 8.5 years ago by Brian Bushnell 20k

Ram · Answer 2 · 2015-10-08

0

Entering edit mode

8.6 years ago

mjhsieh • 0

Please start from the bax.h5 files instead of the fastq files. You can find the documents here and here. Also look into the HGAP assembly method / protocol.

Hope that helps.

ADD COMMENT • link updated 20 months ago by Ram 43k • written 8.6 years ago by mjhsieh • 0