I am trying to assemble bacterial genomes from pacbio reads. I am using CANU.
ISSUES I FOUND:
I run into some issues with the draft assemblies, such as the presence of some "undesired DNA" in the reads, and the blast searches show that the draft assemblies do not match well to the reference sequences.
I have been advised to use some parameters that are used for assemblies of very large genomes (e.g., human), namely:
ADVISED SETTINGS FOR ASSEMBLING MY BACTERIAL GENOMES
corMhapSensitivity=high 'corOutCoverage=100' 'batOptions=-dg 3 -db 3 -dr 1 -ca 500 -cp 50' 'corMinCoverage=0' 'corErrorRate=0.5'
Reading the Canu manual I found out that some of these settings are used in the following situations:
2.10 My assembly continuity is not good, how can I improve it?
"... having output coverage below 20-25X is a sign that correction did not work well (assuming you have more input coverage than that)." ... "re-running with corMhapSensitivity=normal if you have >50X or corMhapSensitivity=high corMinCoverage=0 otherwise can help." You can also increase the target coverage to correct corOutCoverage=100 to get more correct sequences for assembly.
2.11 What parameters can I tweak?
To avoid collapsing the (polyploid) genome:
corOutCoverage=200 "batOptions=-dg 3 -db 3 -dr 1 -ca 500 -cp 50"
For low coverage: "For less than 30X coverage, increase the alllowed difference in overlaps by a few percent" ... "for PacBio and from 14.4% to 16% (or more) with correctedErrorRate=0.16
I am kind of confused here, because the suggested settings seem not to fit well with my data set:
I have large raw coverage in my genomes and the output coverages are 240x and 38x
My data are from bacteria, which are supposed to be haploid genomes.
Dear friends from Biostars, are the ADVISED SETTINGS FOR ASSEMBLING MY GENOMES sound, given the ISSUES I want to solve and the data I have at hand?