Question: Comparing Hi-c/dovetail, BioNano, and pacbio assemblies. Pick the best one?
0
gravatar for mmats010
3 months ago by
mmats01040
mmats01040 wrote:

We work with a VERY complex genome. It is a pathogen, with a large genome size at around ~240Mb. By complex, I mean, we have done PacBio sequencing and FALCON assembly, yet only got about 0.6 Mb N50 values.

In order to try to consolidate our genome into manageable segments (e.g. pseudochromsomes) we decided to utilize both BioNano optical maps and DoveTail HiC. Both methods relied on high molecular weight DNA.

Alas, neither method significantly improved our assembly in terms of N50, even though both assembled the FALCON pacbio contigs in different ways. DoveTail increased N50 from about 0.60Mb to 1.15Mb and a relaxed version of the BioNano pipeline increased the N50 to 916kb. The default parameters were, of course, lower.

MY QUESTION IS...are there any commonly used programs that can consolidate hi-C/opticalMap/pacBio assemblies? Many of the examples I see rely solely on Illumina assemblies, but those typically include mate pair libraries, which nonetheless don't contain the same kind of data as our Illumina PE datasets + optical maps + Hi-C maps. I have looked around and found "Metassembler" and "GAM-NGS", as well as "runBNG" and "BionaniAnalyst", but our group isn't very experienced in this kind of de novo assembly with a VERY difficult genome.

Any Advice would be appreciated.

Mike

ADD COMMENTlink modified 3 months ago by Philipp Bayer4.8k • written 3 months ago by mmats01040

Both methods relied on high molecular weight DNA.

And is that a problem for this organism?

Not an answer to your question, but might be an idea to use some nanopore reads. With careful extraction, manipulation and library prep you can get reads of hundreds of kb's (longest 970kb). That might be able to span complex sequences...

ADD REPLYlink written 3 months ago by WouterDeCoster22k

I mention the HMW DNA because even though we have had success isoloating it, the technologies we've employed to utilize it haven't really worked. Something chromatin-based, like what Phase Genomics does, might be better for us, though it isn't really an option now.

A group down the hall from us actually has a nanopore, but they haven't spoken very highly of it in the time that they've been using it. Perhaps we could ask anyway.

ADD REPLYlink written 3 months ago by mmats01040

a relaxed version of the BioNano pipeline increased the N50 to 916 Mb.

How can a ~240Mb genome have a N50 of 916Mb?

ADD REPLYlink written 3 months ago by h.mon9.1k

Whoops, meant to write "Kb" there

ADD REPLYlink written 3 months ago by mmats01040
1
gravatar for Philipp Bayer
3 months ago by
Philipp Bayer4.8k
Australia/Perth/UWA
Philipp Bayer4.8k wrote:

So there are a few papers which did what you're trying to do.

In the goat genome paper they used both HiC and BioNano and found that HiC worked a bit better for them: http://www.nature.com/ng/journal/v49/n4/full/ng.3802.html They used Lachesis for HiC scaffolding with optimised parameters (somewhere in the supplementary), and then merged the HiC and BioNano scaffolds, look at the supplementary, it's not straightforward (a lot of it looks like manual checking of mummer output)

There is also a recent mosquito genome which uses their own pipeline to scaffold, but no BioNano used: http://science.sciencemag.org/content/early/2017/03/22/science.aal3327.full Pipeline is here: https://github.com/theaidenlab/3d-dna

Lastly, maybe you can do what the most recent wheat genome did, and merge your two assemblies using mummer, not 100% sure how exactly http://www.biorxiv.org/content/biorxiv/early/2017/07/03/159111.full.pdf

I have not tried GAM-NGS or Metassembler, but runBNG and BioNanoAnalyst both won't merge your assemblies (I'm co-author on those two papers)

ADD COMMENTlink written 3 months ago by Philipp Bayer4.8k

Thanks, I'll take a look at those!

I think I understand why runBNG can't merge our Hi-C and BioNano assemblies, but could we not simply take the optical maps, Hi-C fasta output, and assemble those together using runBNG? It sounds like while it quite merge the fasta files from our previous bionano and Hi-C assemblies, but it can at least use a better starting material (N50=1.15Mb vs N50=0.60Mb) for the optical assembly.

ADD REPLYlink written 3 months ago by mmats01040

Yes, runBNG can scaffold your assembly fasta using the BioNano data, that would work!

BTW there's also OMSim, which can simulate optical mapping data from your assembly; https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btx293/3791407/OMSim-a-simulator-for-optical-map-data

ADD REPLYlink written 3 months ago by Philipp Bayer4.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1417 users visited in the last hour