How to improve a genome assembly using Dovetail and PacBio assembly?
I have more of a conceptual question. I have two genome assemblies from the same plant, one from Dovetail technology (~998 Gb) and another is PacBio HiFi assembly (~1.1 Gb). The Dovetail assembly is more contiguous but has lower base quality whereas the PacBio has higher base quality but more fragmented. It is a diploid organism with relatively high heterozygosity.

I tried to use RagTag patch to improve the assembly. But no improvement was obtained.

Is there a way to use both assemblies and produce a hybrid assembly with high contiguity and base quality?

Please share your thoughts.


gconcepcion ▴ 270

Our standard recommendation is to use PacBio HiFi data plus Hi-C data (dovetail, arima genomics, phase genomics) for a high quality assembly both in terms of contiguity and accuracy. Generally a hifiasm assembly will work well with Hi-C data from any of the companies listed.

There are two strategies you can consider here:

1) Re-assemble your genome with hifiasm using your HiFi data + Hi-C raw data to generate a haplotype resolved assembly (which can subsequently be scaffolded):

2) Use hifiasm (or your favorite assembly algorithm) to generate an assembly and contigs that are subsequently scaffolded with Hi-C data.

There are scaffolding tools available online, but each company does it a little bit differently so you should consult with Dovetail for their current recommendations on which software to use.

As far as "using both assemblies to produce a hybrid assembly", while that may be possible, I'm unaware of any software that will help you do that in a way that results in a better assembly than either of the two strategies I mentioned.

Thank you very much for your comment. This makes complete sense. I will consult with Dovetail regarding the point you mentioned.


