FASTQC and PacBio reads
2
1
Entering edit mode
8.6 years ago
tptacek3050 ▴ 70

I'm working with a group that did some PacBio sequencing to aid in the assembly of some bacterial genomes.

The first priority for this group is to assess the quality of the read data. One of the formats that the read data were returned in was of fastq. We already have a pipeline for fastqc, so I first tried running the read files through our fastqc pipeline. However, fastqc keeps crashing due to java out of memory/heap space error. I have two questions:

One, how do I increase the java memory allocation for fastqc?

Two, even though the PacBio reads are in the form of fastq files, should I even be using fastqc? Is there a better program? This is my first time handling PacBio data, so I'm sorry if these are very basic questions.

fastqc memory java PacBio • 8.7k views
ADD COMMENT
1
Entering edit mode

To increase the java memory allocation, add this line to your job script before calling fastqc: export _JAVA_OPTIONS=-Xmx<memory allocation>

ADD REPLY
0
Entering edit mode

-Xmx is the option to increase the memory available to JVM (i.e java programs)

ADD REPLY
2
Entering edit mode
8.6 years ago

I don't think fastqc is appropriate for PacBio data. None of the charts would be useful, in my opinion, even if it did work.

ADD COMMENT
0
Entering edit mode

Hi Brian,

So, would you please recommend some pipeline/package to assess the quality of the PacBio reads? I have also started with FastQC and indeed it didn't provide much useful information.

Thanks,
Ting

ADD REPLY
0
Entering edit mode

The SMRT Pipe utilities should be of use for this, though I don't use them so I'm not sure which ones are appropriate. But I recommend you look there first. If you want an empirical analysis of the error rates, I suggest you map the filtered subreads against a reference or assembly using the BBMap package. For example:

mapPacBio.sh in=reads.fastq ref=reference.fasta maxlen=2000bp minlen=100bp mhist=mhist.txt idhist=idhist.txt indelhist=indelhist.txt qhist=qhist.txt qahist=qahist.txt bhist=bhist.txt covhist=covhist.txt

In addition to the histograms, the stderr output from the process will give a summary of the overall read quality in insertion, deletion, substitution, and match rates.

ADD REPLY
0
Entering edit mode
8.6 years ago
mjhsieh • 0

Please start from the bax.h5 files instead of the fastq files. You can find the documents here and here. Also look into the HGAP assembly method / protocol.

Hope that helps.

ADD COMMENT

Login before adding your answer.

Traffic: 2810 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6