I am trying to assemble a bacterial genome using PacBio reads and short illumina reads from a metagenomic pool of both eurkaryotic host and bacterial sequences.
In an ideal world, I would have been able to culture the bacteria and obtain pure DNA to sequence and assemble from, but its cultivation has proved challenging. So, instead we have a ton of metagenomic data sequenced obtained from whole host DNA. We have roughly ~70 million Illumina MiSeq paired end reads (2x150) and ~120,000 PacBio reads (5-20 KB). I suspect that I have roughly 3x coverage of the bacterial genome with the PacBio reads and ~20-30x coverage with the illumina reads (with 80-90% of the reads likely being host). This project is further complicated by the fact that the host genome is not fully sequenced, so I cannot remove known host sequence to simplify analysis and de novo assembly.
I have been trying to use Canu 1.4 to correct my PacBio reads using fastq as input and the following code:
./canu -correct -pacbio-raw m170212_104155_42146_c101160022550000001823267305221711_s1_p0.3.subreads.fastq -d subreads3-auto -p subreads3 genomeSize=1.5m gnuplotTested=true corMinCoverage=0 errorRate=0.035 corMaxEvidenceErate=0.15
But I keep getting this error: ERROR: File supplied on command line; use -s, -pacbio-raw, -pacbio-corrected, -nanopore-raw, or -nanopore-corrected.
It appears that Canu is not recognizing my input fastq file "m170212_104155_42146_c101160022550000001823267305221711_s1_p0.3.subreads.fastq". Has anyone else received this error and know what to do about it?
Thanks for any help you can provide!