Question: Oxford Nanopore and Illumina hybrid assembly
4
gravatar for igor
3.8 years ago by
igor11k
United States
igor11k wrote:

Are there any de novo genome assemblers that work with both Nanopore and Illumina reads?

SPAdes can take both Nanopore and Illumina reads, but it's only for prokaryotic genomes. I haven't seen anything for eukaryotic.

All the discussion and literature that I have seen so far suggests using Nanopore long reads for assembly and then polishing with Illumina short reads. However, you need a certain level of coverage for the assembly to complete (for example, Canu recommended minimum is 20X). What if you only have 1X coverage with long reads? That will not be enough to assemble on its own, but should be much better than short reads alone. What's the appropriate approach for that situation?

ont nanopore assembly • 4.3k views
ADD COMMENTlink modified 2.6 years ago by Carambakaracho2.2k • written 3.8 years ago by igor11k
1

How large is your genome expected to be?

You could give SPAdes a try. As long as you are not in the "human" genome territory it may work. I recall one of the SPAdes developers writing that it could be used for larger (e.g. fungal genomes) but can't find that post/thread at the moment.

Edit: SPAdes manual refers to not using --careful option for "large or medium" eukaryotic genomes. So looks like you could certainly try it out.

ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by genomax92k

It should be around 500 MB, so it's not too big, but certainly closer to human than bacterial size.

Good point about the --careful option, but the manual also says "SPAdes is not intended for larger genomes (e.g. mammalian size genomes)", so I am not sure which part to believe.

ADD REPLYlink written 3.8 years ago by igor11k

If you have some time (and I think you have the resources, if I recall from a 10x thread) go ahead and give it a try. At the most the job will fail :)

ADD REPLYlink written 3.8 years ago by genomax92k

Good memory!

I certainly plan to give it a try. I just wanted to know if I am missing anything and to have some alternatives in case it fails.

ADD REPLYlink written 3.8 years ago by igor11k

can it be run with in 256 gb ram

ADD REPLYlink written 11 months ago by cutevishal020

Trinity, I think is the best option for nanopore reads in hydrid assembly.

ADD REPLYlink written 3.8 years ago by Buffo1.8k

Do you have a source for that? Because on github I find the following:

Trinity assembles transcript sequences from Illumina RNA-Seq data.

ADD REPLYlink written 3.8 years ago by WouterDeCoster44k

I should've specified it's genome assembly, not transcriptome. Trinity is for RNA-seq.

ADD REPLYlink written 3.8 years ago by igor11k

Oh, sorry that´s true, is for RNA-seq, what about IDBA_hybrid? You can use nanopore-reads as reference.

ADD REPLYlink written 3.8 years ago by Buffo1.8k

Hi there, I'm new to the subject but I will soon be facing the same interrogations. I found only SPAdes and ALLPATHS-LG for the moment that does that.

With a better coverage, what would be the best approach ? Using a pipeline to assemble de novo with Nanopore and Illumina data or assembling the genome with Nanopore data and then correct with Illumina data ? or even complete the draft genome from Illumina with Nanopore data ?

Thank you very much,

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by lagartija90

Nanopore still lacking performance, the ratio cost/performance remains high. I think that PacBio is the best option for long reads and to complete fragmented assemblies (from illumina). Where you from lagartija? I know your name :).

ADD REPLYlink written 2.6 years ago by Buffo1.8k

Actually I already have the reads by Nanopore so I can't change that. By the way, do you know what's the difference between Spades and Spades-Hybrid ? It seems that both can do hybrid assembly...

So you know my name ? You meed lagartija or my real name ? haha I'm from France. But I'm also Argentinian and Norwegian. And you ? Italian ?

ADD REPLYlink written 2.6 years ago by lagartija90

No, is not the same, you can use 'trusted contigs' for de novo assemblies with spades, but not reads. On the other hand, spades hybrid can perform de novo assemblies from long and short reads :). I from the Congo but I live in America years ago, I know lagartijas XD.

ADD REPLYlink written 2.6 years ago by Buffo1.8k

AAAh I see. And how do I get the trusted contigs ? And both for Illumina and Nanopore ?

ADD REPLYlink written 2.6 years ago by lagartija90
1

You can use old assemblies as trusted contigs (from the same specie and closely related), the use of not highly related genomes are not recomended (in spades), if you dont have access to old assemblies (or it does not exist) de novo and hybrid assemblie is the unique option, and yes, You can use reads from nanopore and illumina for hybrid assemblies with spades-hybrid.

ADD REPLYlink written 2.6 years ago by Buffo1.8k

Only if you have them from some other source (e.g. an illumina only assembly).

ADD REPLYlink written 2.6 years ago by genomax92k

Because from what I see here Spades takes reads : http://spades.bioinf.spbau.ru/release3.10.1/manual.html

ADD REPLYlink written 2.6 years ago by lagartija90
5
gravatar for jblommaert92
3.8 years ago by
jblommaert9270
jblommaert9270 wrote:

Just thought I'd add the few options I've seen:

1) OPERA-LG https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0951-y

2) PacBio reccomendations may be relevant here https://github.com/PacificBiosciences/Bioinformatics-Training/wiki/Large-Genome-Assembly-with-PacBio-Long-Reads

3) LINKS https://gigascience.biomedcentral.com/articles/10.1186/s13742-015-0076-3

4) This workflow http://biorxiv.org/content/early/2016/05/22/054783

And this other question may be useful too Gap-filling and scaffolding using PacBio reads

DISCLAIMER: I haven't tried any of these yet, but I'm also planning nanopore-illumina hybrid assembly soon

ADD COMMENTlink written 3.8 years ago by jblommaert9270
1

Those are excellent suggestions!

I should have been looking for "scaffolding" rather than "hybrid assembly", which is probably more appropriate in my case.

ADD REPLYlink written 3.8 years ago by igor11k
2
gravatar for Carambakaracho
2.6 years ago by
Carambakaracho2.2k
Germany/Cologne
Carambakaracho2.2k wrote:

Besides the almost obvious SPAdes I recommend looking into the MaSuRCA assembler. I had very good results for PacBio/Illumina and Nanopore/Illumina data, though my long read coverage was in all cases a little bit higher than what you describe.

BTW, SPAdes easily handles metagenome assemblies with way beyond 1 Gbps and with the latest version the error messages on memory consumption where improved and you'll find out pretty early whether it works or not. Give it a try, once it assembled, I'd even try the --careful option. It is rather depended on the available memory on your machine (should probably be 128GB or more) and the k-mer complexity of your genome than its taxonomic domain.

ADD COMMENTlink modified 21 months ago • written 2.6 years ago by Carambakaracho2.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1631 users visited in the last hour