PacBio genome assembly with canu shorter than expected
1
1
Entering edit mode
7.3 years ago
Rob ▴ 150

Hi,

I have ~70x PacBio reads and I did an assembly with Canu. I expected a 216 Mb genome but I got a 146Mb assembly of 465 contigs with 1.3G of unassembled data.

I tried to modify some parameters, for overlap length or coverage, but I can't get an assembly which reach the size I wanted (it just increase the number of contigs, or did nothing visible).

Is there any way to improve the size of my assembly by adjusting assembler's parameters, or maybe is there a possible problem with my data? (I didn't polish my data yet, because I have trouble with quiver atm, but I don't expect quiver to up the size of my assembly, am I right?)

What can explain this difference? And what can I do for that?

Thanks for your help!

PacBio Canu Assembly genome • 3.1k views
ADD COMMENT
1
Entering edit mode

Did you try first to correct your read using any self correction method like https://www.cs.helsinki.fi/u/lmsalmel/LoRMA/ then assemble and see the results?

ADD REPLY
0
Entering edit mode

I assume Canu correct the reads by itself, so no, I didn't correct them. But I will try your tool, it seems useful.

ADD REPLY
1
Entering edit mode

you are right

Canu will correct the reads

I missed it

ADD REPLY
0
Entering edit mode

Hi, have you got the expected genome size?

ADD REPLY
0
Entering edit mode
4.3 years ago
Lancer • 0
  1. Have you change minReadLength( default 1000)? I suggest you to have a look at your reads distribution ,maybe many of your reads are too short. Read length below 1000bp will be discarded.Please read the canu document canu questions carefully .That will be helpful.
  2. In the assembly step,you can change correctedErrorRate according to your coverage:
  1. For low coverage: For less than 30X coverage, increase the alllow difference in overlaps by a few percent (from 4.5%to 8.5% (or more) with correctedErrorRate=0.105 for PacBio and from 14.4% to 16% (or more) correctedErrorRate=0.16 for Nanopore), to adjust for inferior read correction.Canu will automatically reduce corMinCoverageto zero to correct as many reads as possible.
  2. For high coverage: For more than 60X coverage, decrease the allowed difference in overlaps (from 4.5% to 4.0% with correctedErrorRate=0.040 for PacBio, from 14.4% to 12% with correctedErrorRate=0.12 for Nanopore), so that only the better corrected reads are used.This is primarily an optimization for speed and generally does not change assembly continuity.
ADD COMMENT

Login before adding your answer.

Traffic: 2553 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6