Question: Assembling PacBio reads using a reference genome
21 months ago
celeste wrote:

I'm new in resequencing strategies and I would like to know if there are some specific software to assemble the recent PacBio reads using a reference genome. A reference-based genome assembly using PacBio reads. I have to assemble a plant genome of about 400 Mb (and I have the reference genome).

Ok sorry I'll try to explain it better. I don't want to perform an hybrid assembly and I don't want to upgrade an already obtained assembly by using PacBio reads (as is typically done for filling gaps and joining contigs into scaffolds). The reference assembly I own belongs to a different species, so I want to use it exclusively as guidance to ease the assembly of the PacBio reads and resolve the ambiguities. The reference assembly must not be part of the final output. I know by experience that CLC Genomics Workbench can do this kind of process very well for the assembly of Illumina reads. I hope I have been able to better explain the situation. Thank you all for your interest in my post!

I personally do not know any reference guided pacbio assemblers. You can

1) Map the reads to the reference, go for local assemblies and then scaffold based on the complete set of pacbio reads

2) de novo assemble everything, then scaffold using the reference

The 2nd option is better since if there are large insertions in your species, this might be missing if you go for local assemblies alone.

Do you mean alignment or mapping instead of assembly?

I think he is talking about first de novo assembly followed by using contigs and reference genome.
for example IDBA-Hybrid It is not for PacBio

a paper that about that here also not for PacBio

But I do not kow tools that do this

I don't really understand why you are trying to guide your assembly with a reference. My guess is that you already tried a first de novo assembly, without any reference, and that you were skeptical about the results, as you said you want a reference to resolve ambiguities.

What kind of ambiguities did you faced about that ? I'm just curious (as also had some rough time with PacBio assembly but finally managed to have a pretty result, maybe I can help you )

Bump! Anyone know of any methods for the original question not listed here that may have shown up in the past 17 months?

We are needing to assemble genomes from plant cultivars that are different from the cultivar for which 2nd/3rd round genome assemblies are publicly available. We are hoping to find a way to incorporate data from the reference and at least one cultivar to improve assembly and subsequently identify structural changes.

21 months ago
Istvan Albert
University Park, USA
Istvan Albert wrote:

The PacBio GitHub page has a list of the recommended software

None of the software in the link depends on reference AFAIK.

20 months ago
Indianapolis, IN
tjduncan wrote:

"Long-read sequencing improves assembly of Trichinella genomes 10-fold, revealing substantial synteny between lineages diverged over 7 million years."

In this recent paper they do something similar to what I believe you are looking to do. They take a short read genome assembly of (T. spiralis) and do a bunch of comparisons and help guide an assembly of a very similar species (T. murrelli) they they have pacbio long read data for. Hopefully some of this paper is helpful.

