Question: To reassemble Illumina and PacBio, or just upgrade previous assembly with PacBio?
gravatar for msobol
9 months ago by
msobol10 wrote:


I previously assembled a fungal genome with Illumina Hi-Seq paired-end sequences. The assembly was ~ 32Mbp and was made up of ~ 400 contigs. I did not try to join the contigs into scaffolds. BUSCO determined that the assembly was ~98% complete based on the number of orthologs.

However, I just received PacBio sequences from the same isolates and want to use them improve the assembly and possibly close the genome. My question is asking if I should reassemble the Illumina reads and PacBio reads together using SPAdes or some other hybrid assembler, or if I should update the pre-existing Illumina assembly with PacBio using PBJelly?

Thanks in advance! Morgan

illumina pacbio assembly genome • 1.0k views
ADD COMMENTlink modified 8 months ago by harish180 • written 9 months ago by msobol10

Try everything (short summary of a much longer thing posted by @h.mon below) you can.

ADD REPLYlink written 9 months ago by genomax65k

I don't know if you were able to solve your problem by now, but I'll never stop to cite that article :

They are presenting different strategy of assembly according to their coverage and also present a tool you may give a try to : quickmerge (able to merge two different kind of assembly to improve your results).

Notify me if this informations was of any use to you :)

ADD REPLYlink written 8 months ago by Roxane Boyer920

I'm recently also stumbled upon this quickmerge tool and and am currently applying it to my data, I must say I'm a fan in the meanwhile , it runs really fast and the results are more than OK.

ADD REPLYlink written 8 months ago by lieven.sterck4.5k

Glad to hear I'm not the only quickmerge fan :)

ADD REPLYlink written 8 months ago by Roxane Boyer920

Add me to that list!

If you can hack around the scripts or compute the deltas using MUMmer 4, it becomes even more insanely fast.

On that line, you can also try CAMSA

ADD REPLYlink written 8 months ago by harish180

interesting comment harish , I haven't gotten that deep in it yet. Care to share some insgiht on how to achieve the switch to MUMmer4 ?

EDIT: is it as easy as to point it to the location of the MUMmer4 binaries?

thx for the tip, will look in to it

ADD REPLYlink modified 8 months ago • written 8 months ago by lieven.sterck4.5k
gravatar for h.mon
9 months ago by
h.mon24k wrote:

edit: As soon as I posted, it occurred to me it is probably better to perform a de novo hybrid assembly, as I think this has a better change of correcting small duplications which may have been misassembled on the Illumina-only assembly.

end of edit.

Probably depends on the coverage for both Illumina and PacBio, according to this page: Large Genome Assembly with PacBio Long Reads

algorithm suggestions

In any case, do not forget to polish your assembly after incorporating PAcBio reads:

On stuck records and indel errors; or “stop publishing bad genomes”

ADD COMMENTlink modified 9 months ago • written 9 months ago by h.mon24k

To add to this, id be inclined to say hybrid reassembly as you might get some bonus short reads map that didnt before to give you confidence in some more of the pacbio basecalls

ADD REPLYlink written 9 months ago by jrj.healey11k
gravatar for colindaven
9 months ago by
Hannover Medical School
colindaven1.2k wrote:

I would just assemble the new PacBio sequences de novo, eg with Canu. I would be surprised if you didn't have 40X + coverage. The Pacbio assembly is going to be on a different planet to your existing assembly. The Illumina data can still be used to polish the assembly errors with Racon and or Pilon.

ADD COMMENTlink written 9 months ago by colindaven1.2k
gravatar for harish
8 months ago by
harish180 wrote:

It would be better that you reassemble the data using PacBio long reads and error correct them using the Illumina Reads.

Alternatively, you can do a hybrid assembly. Since the genome that you are using is only 32Mb, you can safely use Unicycler.

What I generally do for much larger genomes is do a long read based standalone assembly, call a consensus and polish it. Then you can use depending on how fragmented your short read assembly is merging them both using quickmerge or GAM-NGS etc.

Personally I have had good results using quickmerge. If you can hack the code a bit or process the steps individually, then consider using MUMmer4.

Alternatively, use DBG2OLC on your corrected PacBio reads taking your Illumina assembly as a base.

ADD COMMENTlink written 8 months ago by harish180
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 709 users visited in the last hour