Question: PacBio assemblies only ending up somewhere between 80 and 250 contigs.
gravatar for dylan.lawrence
11 months ago by
dylan.lawrence10 wrote:

I have done a lot of denovo assembly with NGS data (Illumina NextSeq and MiSeq) and expect to only get a "pretty good" final assembly. However with PacBio I was under the impression this improved greatly. I'm struggling to finalize assemblies though.

Currently I have tried the following assemblers:

  • CANU
  • HGAP4/Whatever the pbsmrtpipe de novo assembly pipeline is
  • SOAPdenovo with hybrid mode (pacbio+illumina)

I generated my data from a multiplexed run on a PacBio Sequel machine and demulitplexed with lima.

Of the assemblies the hybrid did the best. The overrall assembly contained ~500 conitgs and was twice the expected genome size. However if I filtered out conitgs <10,000 base pairs I ended up with 80 contigs whose length is extremely close to the expected genome size.

What do I do from here? I've tried circlator which seems to only try to circularize the contigs themselves. My next step is to considered quickmerge to possibly finalize.

Has anyone else hit a similar stumbling block in trying to finish a genome using PacBio reads?

pacbio assembly de novo • 473 views
ADD COMMENTlink written 11 months ago by dylan.lawrence10

What is the organism? I suppose it is a bacteria, as you were trying circlator. It is really strange the final assembly being twice the expected genome size, did you check for contaminants?

ADD REPLYlink written 11 months ago by h.mon26k

Not in depth but I have performed Illumina sequencing on this same sample and there were no contaminants.

ADD REPLYlink written 11 months ago by dylan.lawrence10

I think the problem here is that the genome is diploid or possibly polyploid. In the case of Diploid or polyploidic genomes the assembly size can be generally more than the haploid genome size, which is what OP wants.

OP, I think you can filter the same by genome vs genome alignments. The Diploidic sequences will show a pretty high identity. You can subsequently filter the same.

If you can post the parameters that you have used we can probably suggest better sets for your assembly.

ADD REPLYlink written 10 months ago by harish200
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2042 users visited in the last hour