Question: To reassemble Illumina and PacBio, or just upgrade previous assembly with PacBio?
1
gravatar for msobol
15 months ago by
msobol20
msobol20 wrote:

Hi,

I previously assembled a fungal genome with Illumina Hi-Seq paired-end sequences. The assembly was ~ 32Mbp and was made up of ~ 400 contigs. I did not try to join the contigs into scaffolds. BUSCO determined that the assembly was ~98% complete based on the number of orthologs.

However, I just received PacBio sequences from the same isolates and want to use them improve the assembly and possibly close the genome. My question is asking if I should reassemble the Illumina reads and PacBio reads together using SPAdes or some other hybrid assembler, or if I should update the pre-existing Illumina assembly with PacBio using PBJelly?

Thanks in advance! Morgan

illumina pacbio assembly genome • 1.5k views
ADD COMMENTlink modified 14 months ago by harish230 • written 15 months ago by msobol20

Try everything (short summary of a much longer thing posted by @h.mon below) you can.

ADD REPLYlink written 15 months ago by genomax73k

I don't know if you were able to solve your problem by now, but I'll never stop to cite that article : https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5100563/

They are presenting different strategy of assembly according to their coverage and also present a tool you may give a try to : quickmerge (able to merge two different kind of assembly to improve your results).

Notify me if this informations was of any use to you :)

ADD REPLYlink written 14 months ago by Rox1.1k
1

I'm recently also stumbled upon this quickmerge tool and and am currently applying it to my data, I must say I'm a fan in the meanwhile , it runs really fast and the results are more than OK.

ADD REPLYlink written 14 months ago by lieven.sterck6.1k

Glad to hear I'm not the only quickmerge fan :)

ADD REPLYlink written 14 months ago by Rox1.1k
1

Add me to that list!

If you can hack around the scripts or compute the deltas using MUMmer 4, it becomes even more insanely fast.

On that line, you can also try CAMSA

ADD REPLYlink written 14 months ago by harish230

interesting comment harish , I haven't gotten that deep in it yet. Care to share some insgiht on how to achieve the switch to MUMmer4 ?

EDIT: is it as easy as to point it to the location of the MUMmer4 binaries?

thx for the tip, will look in to it

ADD REPLYlink modified 14 months ago • written 14 months ago by lieven.sterck6.1k
1
gravatar for h.mon
15 months ago by
h.mon27k
Brazil
h.mon27k wrote:

edit: As soon as I posted, it occurred to me it is probably better to perform a de novo hybrid assembly, as I think this has a better chance of correcting small duplications which may have been misassembled on the Illumina-only assembly.

end of edit.

Probably depends on the coverage for both Illumina and PacBio, according to this page: Large Genome Assembly with PacBio Long Reads

algorithm suggestions

In any case, do not forget to polish your assembly after incorporating PAcBio reads:

On stuck records and indel errors; or “stop publishing bad genomes”

ADD COMMENTlink modified 5 months ago • written 15 months ago by h.mon27k

To add to this, id be inclined to say hybrid reassembly as you might get some bonus short reads map that didnt before to give you confidence in some more of the pacbio basecalls

ADD REPLYlink written 15 months ago by Joe14k
1
gravatar for colindaven
15 months ago by
colindaven1.8k
Hannover Medical School
colindaven1.8k wrote:

I would just assemble the new PacBio sequences de novo, eg with Canu. I would be surprised if you didn't have 40X + coverage. The Pacbio assembly is going to be on a different planet to your existing assembly. The Illumina data can still be used to polish the assembly errors with Racon and or Pilon.

ADD COMMENTlink written 15 months ago by colindaven1.8k
1
gravatar for harish
14 months ago by
harish230
harish230 wrote:

It would be better that you reassemble the data using PacBio long reads and error correct them using the Illumina Reads.

Alternatively, you can do a hybrid assembly. Since the genome that you are using is only 32Mb, you can safely use Unicycler.

What I generally do for much larger genomes is do a long read based standalone assembly, call a consensus and polish it. Then you can use depending on how fragmented your short read assembly is merging them both using quickmerge or GAM-NGS etc.

Personally I have had good results using quickmerge. If you can hack the code a bit or process the steps individually, then consider using MUMmer4.

Alternatively, use DBG2OLC on your corrected PacBio reads taking your Illumina assembly as a base.

ADD COMMENTlink written 14 months ago by harish230
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1877 users visited in the last hour