Question: Questions about using the Celera Genome Assembler for HGAP
0
gravatar for tptacek3050
3.4 years ago by
tptacek305060
United States
tptacek305060 wrote:

This post is a followup to a previous post: A: FASTQC and PacBio reads

 

I am trying to use the PBcR pipeline for the Celera Genome Assembler (v8.3) to perform HGAP for pacbio reads (http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR).

 

I've got the assembler installed, and I was able to successfully assemble the lambda genome in the example provided on the wiki page (see link above). I then tried running the assembler on my own PacBio reads using the following script:

#Celera genome assembler directory
CELERA="~/wgs-8.3rc2/Linux-amd64/bin/"

#Output directory
OUT="celera_output_3"

#Variables from parameters
FILE=$1
NAME=$2
SPEC=$3

#Raw data directory
#RAW="raw_data_test_phage"

#Perl environment variable
export PERLLIB=~/perl/modules/lib/perl5
export PERL5LIB=~/perl/modules/lib/perl5

#Create output directory and switch to it
mkdir -p $OUT/$NAME
cd $OUT/$NAME

#Run assembler
$CELERA/PBcR -length 5000 -s ../../$SPEC -l $NAME -fastq ../../$FILE genomeSize=50000

 

I do not get a asm.asm or asm.qc file. I also don't see any obvious errors in the log files. Then again, the log file that the celera assembler produces is quite long and I may be missing something. The structure of the output (i.e. files and directories) looks like this:

|-- [NAME]

|   |-- 0-mercounts

|   |-- 0-mertrim

|   |-- 0-overlaptrim

|   |-- 0-overlaptrim-overlap

|   |-- 1-overlapper

|   |-- 3-overlapcorrection

|   |-- 4-unitigger

|   |-- 5-consensus

|   |-- 5-consensus-coverage-stat

|   |-- 5-consensus-insert-sizes

|   |-- asm.gkpStore

|   |-- asm.gkpStore.err

|   |-- asm.gkpStore.errorLog

|   |-- asm.gkpStore.fastqUIDmap

|   |-- asm.gkpStore.info

|   |-- asm.ovlStore

|   |-- asm.ovlStore.err

|   |-- asm.ovlStore.list

|   |-- asm.tigStore

|   `-- runCA-logs

|-- [NAME].correction.err

|-- [NAME].correction.hist

|-- [NAME].fasta

|-- [NAME].fastq

|-- [NAME].frg

|-- [NAME].log

|-- [NAME].longest25.fastq -> [NAME].fastq

|-- [NAME].longest25.frg -> [NAME].frg

|-- [NAME].qual

`-- temp[NAME]

    |-- 1-overlapper

    |-- [NAME].frg

    |-- [NAME].spec

    |-- asm.eidToIID

    |-- asm.gkpStore.err

    |-- asm.gkpStore.errorLog

    |-- asm.gkpStore.fastqUIDmap

    |-- asm.gkpStore.info

    |-- asm.hist

    |-- asm.ignore

    |-- asm.iidToLen

    |-- asm.layout.err

    |-- asm.layout.hist

    |-- asm.layout.success

    |-- asm.ovlStore.err

    |-- asm.ovlStore.list

    |-- asm.seedlength

    |-- asm.split.allEdit

    |-- asm.split.uid

    |-- asm.toerase.err

    |-- asm.toerase.out

    |-- asm.toerase.uid

    |-- asm.totalInputBP

    |-- corrected.log

    |-- runCA-logs

    |-- runCorrection.sh

    `-- runPartition.sh

 

So my questions are as follows:

-Why am I not getting an asm.asm (the assembly I assume) or a asm.qc (assembly statistics) file?

-If the assembly failed, where in the logs can I get an indication as to why it failed?

-The lambda example included a parameter called -partitions. What is this parameter? I couldn't find an explanation for it and I didn't include it in my script

-The raw data that we recieved all had the suffix .subreads.fastq. Is there a post-processing step that needs to be run before I run assembly?

pacbio hgap celera assembler • 1.2k views
ADD COMMENTlink modified 3.4 years ago by rhall160 • written 3.4 years ago by tptacek305060
0
gravatar for rhall
3.4 years ago by
rhall160
United States
rhall160 wrote:

The assembly failed during the 5-consensus stage. Check the [ Name ] - runCA-logs directory for the specific task failure. My guess would be in utgcns, possibly memory related. 

ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by rhall160

There was a _utgcnsfix file, but no _utgcns file. The contents of this file (1446144349_sipsey-compute-1-12.local_20669_utgcnsfix) were as follows:

 

CA version 8.3rc2 ($Id: utgcnsfix.C 4442 2013-10-04 14:33:50Z brianwalenz $).

Error Rates:
AS_OVL_ERROR_RATE 0.030000
AS_CNS_ERROR_RATE 0.100000
AS_CGW_ERROR_RATE 0.100000
AS_MAX_ERROR_RATE 0.400000

Current Working Directory:
/scratch/user/tptacek/Vikram/celera_output_4/H37Rv

Command:
/home/tptacek/wgs-8.3rc2/Linux-amd64/bin/utgcnsfix \
  -g /scratch/user/tptacek/Vikram/celera_output_4/H37Rv/H37Rv/asm.gkpStore \
  -t /scratch/user/tptacek/Vikram/celera_output_4/H37Rv/H37Rv/asm.tigStore 2 001 \
  -o /scratch/user/tptacek/Vikram/celera_output_4/H37Rv/H37Rv/5-consensus/asm_001.fixes

 

I browsed through the other files in this directory, and I didn't see any obvious error messages. All of the other files looked like this. The contents of the runCA-logs directory looks like this:

 

-rw-r--r-- 1 tptacek genetics 1762 Oct 29 13:45 1446144324_sipsey-compute-1-12.local_20273_runCA
-rw-r--r-- 1 tptacek genetics  520 Oct 29 13:45 1446144324_sipsey-compute-1-12.local_20280_gatekeeper
-rw-r--r-- 1 tptacek genetics  430 Oct 29 13:45 1446144324_sipsey-compute-1-12.local_20283_gatekeeper
-rw-r--r-- 1 tptacek genetics  443 Oct 29 13:45 1446144324_sipsey-compute-1-12.local_20285_gatekeeper
-rw-r--r-- 1 tptacek genetics  456 Oct 29 13:45 1446144324_sipsey-compute-1-12.local_20288_gatekeeper
-rw-r--r-- 1 tptacek genetics  543 Oct 29 13:45 1446144324_sipsey-compute-1-12.local_20290_initialTrim
-rw-r--r-- 1 tptacek genetics  443 Oct 29 13:45 1446144324_sipsey-compute-1-12.local_20291_gatekeeper
-rw-r--r-- 1 tptacek genetics  339 Oct 29 13:45 1446144324_sipsey-compute-1-12.local_20293_meryl
-rw-r--r-- 1 tptacek genetics  606 Oct 29 13:45 1446144324_sipsey-compute-1-12.local_20295_meryl
-rw-r--r-- 1 tptacek genetics  475 Oct 29 13:45 1446144324_sipsey-compute-1-12.local_20297_estimate-mer-threshold
-rw-r--r-- 1 tptacek genetics  458 Oct 29 13:45 1446144324_sipsey-compute-1-12.local_20299_meryl
-rw-r--r-- 1 tptacek genetics  339 Oct 29 13:45 1446144325_sipsey-compute-1-12.local_20300_meryl
-rw-r--r-- 1 tptacek genetics  458 Oct 29 13:45 1446144325_sipsey-compute-1-12.local_20302_meryl
-rw-r--r-- 1 tptacek genetics  580 Oct 29 13:45 1446144325_sipsey-compute-1-12.local_20305_overlap_partition
-rw-r--r-- 1 tptacek genetics  760 Oct 29 13:45 1446144325_sipsey-compute-1-12.local_20321_overlapInCore
-rw-r--r-- 1 tptacek genetics  671 Oct 29 13:45 1446144332_sipsey-compute-1-12.local_20374_overlapStoreBuild
-rw-r--r-- 1 tptacek genetics  671 Oct 29 13:45 1446144332_sipsey-compute-1-12.local_20380_overlapStoreBuild
-rw-r--r-- 1 tptacek genetics  840 Oct 29 13:45 1446144332_sipsey-compute-1-12.local_20384_deduplicate
-rw-r--r-- 1 tptacek genetics  643 Oct 29 13:45 1446144332_sipsey-compute-1-12.local_20386_finalTrim
-rw-r--r-- 1 tptacek genetics  637 Oct 29 13:45 1446144332_sipsey-compute-1-12.local_20388_chimera
-rw-r--r-- 1 tptacek genetics  571 Oct 29 13:45 1446144332_sipsey-compute-1-12.local_20392_overlap_partition
-rw-r--r-- 1 tptacek genetics  744 Oct 29 13:45 1446144332_sipsey-compute-1-12.local_20408_overlapInCore
-rw-r--r-- 1 tptacek genetics  620 Oct 29 13:45 1446144340_sipsey-compute-1-12.local_20442_overlapStoreBuild
-rw-r--r-- 1 tptacek genetics  615 Oct 29 13:45 1446144340_sipsey-compute-1-12.local_20454_correct-frags
-rw-r--r-- 1 tptacek genetics  707 Oct 29 13:45 1446144343_sipsey-compute-1-12.local_20484_correct-olaps
-rw-r--r-- 1 tptacek genetics  537 Oct 29 13:45 1446144347_sipsey-compute-1-12.local_20632_overlapStore
-rw-r--r-- 1 tptacek genetics  747 Oct 29 13:45 1446144347_sipsey-compute-1-12.local_20635_bogart
-rw-r--r-- 1 tptacek genetics  524 Oct 29 13:45 1446144348_sipsey-compute-1-12.local_20652_gatekeeper
-rw-r--r-- 1 tptacek genetics  595 Oct 29 13:45 1446144348_sipsey-compute-1-12.local_20661_gatekeeper
-rw-r--r-- 1 tptacek genetics  527 Oct 29 13:45 1446144348_sipsey-compute-1-12.local_20663_tigStore
-rw-r--r-- 1 tptacek genetics  619 Oct 29 13:45 1446144349_sipsey-compute-1-12.local_20667_tigStore
-rw-r--r-- 1 tptacek genetics  590 Oct 29 13:45 1446144349_sipsey-compute-1-12.local_20669_utgcnsfix
-rw-r--r-- 1 tptacek genetics  605 Oct 29 13:45 1446144349_sipsey-compute-1-12.local_20674_tigStore
-rw-r--r-- 1 tptacek genetics  573 Oct 29 13:45 1446144349_sipsey-compute-1-12.local_20678_tigStore
-rw-r--r-- 1 tptacek genetics  561 Oct 29 13:45 1446144349_sipsey-compute-1-12.local_20682_gatekeeper
-rw-r--r-- 1 tptacek genetics  651 Oct 29 13:45 1446144349_sipsey-compute-1-12.local_20685_computeCoverageStat

 

Does any of this look unusual? In the mean time, I'll queue up another run with increased memory.

ADD REPLYlink written 3.4 years ago by tptacek305060
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1399 users visited in the last hour