Question: FASTQC and PacBio reads
1
gravatar for tptacek3050
4.0 years ago by
tptacek305060
United States
tptacek305060 wrote:

I'm working with a group that did some PacBio sequencing to aid in the assembly of some bacterial genomes.

 

The first priority for this group is to assess the quality of the read data. One of the formats that the read data were returned in was of fastq. We already have a pipeline for fastqc, so I first tried running the read files through our fastqc pipeline. However, fastqc keeps crashing due to java out of memory/heap space error. I have two questions:

 

One, how do I increase the java memory allocation for fastqc?

 

Two, even though the PacBio reads are in the form of fastq files, should I even be using fastqc? Is there a better program? This is my first time handling PacBio data, so I'm sorry if these are very basic questions.

fastqc pacbio memory java • 4.0k views
ADD COMMENTlink modified 4.0 years ago by mjhsieh0 • written 4.0 years ago by tptacek305060

To increase the java memory allocation, add this line to your job script before calling fastqc: export _JAVA_OPTIONS=-Xmx<memory allocation="">

ADD REPLYlink written 5 months ago by jimmybernot0
2
gravatar for Brian Bushnell
4.0 years ago by
Walnut Creek, USA
Brian Bushnell16k wrote:

I don't think fastqc is appropriate for PacBio data.  None of the charts would be useful, in my opinion, even if it did work.

ADD COMMENTlink written 4.0 years ago by Brian Bushnell16k

Hi Brian,

So, would you please recommend some pipeline/package to assess the quality of the PacBio reads? I have also started with FastQC and indeed it didn't provide much useful information.

Thanks,

Ting

ADD REPLYlink written 4.0 years ago by purplemoonyt0

The SMRT Pipe utilities should be of use for this, though I don't use them so I'm not sure which ones are appropriate.  But I recommend you look there first.  If you want an empirical analysis of the error rates, I suggest you map the filtered subreads against a reference or assembly using the BBMap package.  For example:

mapPacBio.sh in=reads.fastq ref=reference.fasta maxlen=2000bp minlen=100bp mhist=mhist.txt idhist=idhist.txt indelhist=indelhist.txt qhist=qhist.txt qahist=qahist.txt bhist=bhist.txt covhist=covhist.txt

In addition to the histograms, the stderr output from the process will give a summary of the overall read quality in insertion, deletion, substitution, and match rates.

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by Brian Bushnell16k
0
gravatar for cpad0112
4.0 years ago by
cpad011212k
India
cpad011212k wrote:

-Xmx is the option to increase the memory available to JVM (i.e java programs)

ADD COMMENTlink written 4.0 years ago by cpad011212k
0
gravatar for mjhsieh
4.0 years ago by
mjhsieh0
United States
mjhsieh0 wrote:

Please start from the bax.h5 files instead of the fastq files. You can find the documents here https://github.com/PacificBiosciences/SMRT-Analysis/wiki/Official-Documentation and here https://github.com/PacificBiosciences/Bioinformatics-Training/wiki . Also look into the HGAP assembly method / protocol .

Hope that helps.

ADD COMMENTlink written 4.0 years ago by mjhsieh0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1316 users visited in the last hour