This post is a followup to a previous post: A: FASTQC and PacBio reads
I am trying to use the PBcR pipeline for the Celera Genome Assembler (v8.3) to perform HGAP for pacbio reads (http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR).
I've got the assembler installed, and I was able to successfully assemble the lambda genome in the example provided on the wiki page (see link above). I then tried running the assembler on my own PacBio reads using the following script:
#Celera genome assembler directory CELERA="~/wgs-8.3rc2/Linux-amd64/bin/" #Output directory OUT="celera_output_3" #Variables from parameters FILE=$1 NAME=$2 SPEC=$3 #Raw data directory #RAW="raw_data_test_phage" #Perl environment variable export PERLLIB=~/perl/modules/lib/perl5 export PERL5LIB=~/perl/modules/lib/perl5 #Create output directory and switch to it mkdir -p $OUT/$NAME cd $OUT/$NAME #Run assembler $CELERA/PBcR -length 5000 -s ../../$SPEC -l $NAME -fastq ../../$FILE genomeSize=50000
I do not get a asm.asm or asm.qc file. I also don't see any obvious errors in the log files. Then again, the log file that the celera assembler produces is quite long and I may be missing something. The structure of the output (i.e. files and directories) looks like this:
|-- [NAME] | |-- 0-mercounts | |-- 0-mertrim | |-- 0-overlaptrim | |-- 0-overlaptrim-overlap | |-- 1-overlapper | |-- 3-overlapcorrection | |-- 4-unitigger | |-- 5-consensus | |-- 5-consensus-coverage-stat | |-- 5-consensus-insert-sizes | |-- asm.gkpStore | |-- asm.gkpStore.err | |-- asm.gkpStore.errorLog | |-- asm.gkpStore.fastqUIDmap | |-- asm.gkpStore.info | |-- asm.ovlStore | |-- asm.ovlStore.err | |-- asm.ovlStore.list | |-- asm.tigStore | `-- runCA-logs |-- [NAME].correction.err |-- [NAME].correction.hist |-- [NAME].fasta |-- [NAME].fastq |-- [NAME].frg |-- [NAME].log |-- [NAME].longest25.fastq -> [NAME].fastq |-- [NAME].longest25.frg -> [NAME].frg |-- [NAME].qual `-- temp[NAME] |-- 1-overlapper |-- [NAME].frg |-- [NAME].spec |-- asm.eidToIID |-- asm.gkpStore.err |-- asm.gkpStore.errorLog |-- asm.gkpStore.fastqUIDmap |-- asm.gkpStore.info |-- asm.hist |-- asm.ignore |-- asm.iidToLen |-- asm.layout.err |-- asm.layout.hist |-- asm.layout.success |-- asm.ovlStore.err |-- asm.ovlStore.list |-- asm.seedlength |-- asm.split.allEdit |-- asm.split.uid |-- asm.toerase.err |-- asm.toerase.out |-- asm.toerase.uid |-- asm.totalInputBP |-- corrected.log |-- runCA-logs |-- runCorrection.sh `-- runPartition.sh
So my questions are as follows:
-Why am I not getting an asm.asm (the assembly I assume) or a asm.qc (assembly statistics) file?
-If the assembly failed, where in the logs can I get an indication as to why it failed?
-The lambda example included a parameter called -partitions. What is this parameter? I couldn't find an explanation for it and I didn't include it in my script
-The raw data that we recieved all had the suffix .subreads.fastq. Is there a post-processing step that needs to be run before I run assembly?