Question: PacBio assemblies only ending up somewhere between 80 and 250 contigs.
I have done a lot of denovo assembly with NGS data (Illumina NextSeq and MiSeq) and expect to only get a "pretty good" final assembly. However with PacBio I was under the impression this improved greatly. I'm struggling to finalize assemblies though.

Currently I have tried the following assemblers:

  • CANU
  • HGAP4/Whatever the pbsmrtpipe de novo assembly pipeline is
  • SOAPdenovo with hybrid mode (pacbio+illumina)

I generated my data from a multiplexed run on a PacBio Sequel machine and demulitplexed with lima.

Of the assemblies the hybrid did the best. The overrall assembly contained ~500 conitgs and was twice the expected genome size. However if I filtered out conitgs <10,000 base pairs I ended up with 80 contigs whose length is extremely close to the expected genome size.

What do I do from here? I've tried circlator which seems to only try to circularize the contigs themselves. My next step is to considered quickmerge to possibly finalize.

Has anyone else hit a similar stumbling block in trying to finish a genome using PacBio reads?

What is the organism? I suppose it is a bacteria, as you were trying circlator. It is really strange the final assembly being twice the expected genome size, did you check for contaminants?

Not in depth but I have performed Illumina sequencing on this same sample and there were no contaminants.

