Question: PacBio assemblies only ending up somewhere between 80 and 250 contigs.
gravatar for dylan.lawrence
2.1 years ago by
dylan.lawrence20 wrote:

I have done a lot of denovo assembly with NGS data (Illumina NextSeq and MiSeq) and expect to only get a "pretty good" final assembly. However with PacBio I was under the impression this improved greatly. I'm struggling to finalize assemblies though.

Currently I have tried the following assemblers:

  • CANU
  • HGAP4/Whatever the pbsmrtpipe de novo assembly pipeline is
  • SOAPdenovo with hybrid mode (pacbio+illumina)

I generated my data from a multiplexed run on a PacBio Sequel machine and demulitplexed with lima.

Of the assemblies the hybrid did the best. The overrall assembly contained ~500 conitgs and was twice the expected genome size. However if I filtered out conitgs <10,000 base pairs I ended up with 80 contigs whose length is extremely close to the expected genome size.

What do I do from here? I've tried circlator which seems to only try to circularize the contigs themselves. My next step is to considered quickmerge to possibly finalize.

Has anyone else hit a similar stumbling block in trying to finish a genome using PacBio reads?

pacbio assembly de novo • 737 views
ADD COMMENTlink written 2.1 years ago by dylan.lawrence20

What is the organism? I suppose it is a bacteria, as you were trying circlator. It is really strange the final assembly being twice the expected genome size, did you check for contaminants?

ADD REPLYlink written 2.1 years ago by h.mon30k

Not in depth but I have performed Illumina sequencing on this same sample and there were no contaminants.

ADD REPLYlink written 2.1 years ago by dylan.lawrence20

I think the problem here is that the genome is diploid or possibly polyploid. In the case of Diploid or polyploidic genomes the assembly size can be generally more than the haploid genome size, which is what OP wants.

OP, I think you can filter the same by genome vs genome alignments. The Diploidic sequences will show a pretty high identity. You can subsequently filter the same.

If you can post the parameters that you have used we can probably suggest better sets for your assembly.

ADD REPLYlink written 2.0 years ago by harish290
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1099 users visited in the last hour