Question: Oxford Nanopore and Illumina hybrid assembly
4
gravatar for igor
2.4 years ago by
igor7.7k
United States
igor7.7k wrote:

Are there any de novo genome assemblers that work with both Nanopore and Illumina reads?

SPAdes can take both Nanopore and Illumina reads, but it's only for prokaryotic genomes. I haven't seen anything for eukaryotic.

All the discussion and literature that I have seen so far suggests using Nanopore long reads for assembly and then polishing with Illumina short reads. However, you need a certain level of coverage for the assembly to complete (for example, Canu recommended minimum is 20X). What if you only have 1X coverage with long reads? That will not be enough to assemble on its own, but should be much better than short reads alone. What's the appropriate approach for that situation?

ont nanopore assembly • 3.2k views
ADD COMMENTlink modified 14 months ago by Carambakaracho1.2k • written 2.4 years ago by igor7.7k
1

How large is your genome expected to be?

You could give SPAdes a try. As long as you are not in the "human" genome territory it may work. I recall one of the SPAdes developers writing that it could be used for larger (e.g. fungal genomes) but can't find that post/thread at the moment.

Edit: SPAdes manual refers to not using --careful option for "large or medium" eukaryotic genomes. So looks like you could certainly try it out.

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by genomax68k

It should be around 500 MB, so it's not too big, but certainly closer to human than bacterial size.

Good point about the --careful option, but the manual also says "SPAdes is not intended for larger genomes (e.g. mammalian size genomes)", so I am not sure which part to believe.

ADD REPLYlink written 2.4 years ago by igor7.7k

If you have some time (and I think you have the resources, if I recall from a 10x thread) go ahead and give it a try. At the most the job will fail :)

ADD REPLYlink written 2.4 years ago by genomax68k

Good memory!

I certainly plan to give it a try. I just wanted to know if I am missing anything and to have some alternatives in case it fails.

ADD REPLYlink written 2.4 years ago by igor7.7k

Trinity, I think is the best option for nanopore reads in hydrid assembly.

ADD REPLYlink written 2.4 years ago by Buffo1.6k

Do you have a source for that? Because on github I find the following:

Trinity assembles transcript sequences from Illumina RNA-Seq data.

ADD REPLYlink written 2.4 years ago by WouterDeCoster39k

I should've specified it's genome assembly, not transcriptome. Trinity is for RNA-seq.

ADD REPLYlink written 2.4 years ago by igor7.7k

Oh, sorry that´s true, is for RNA-seq, what about IDBA_hybrid? You can use nanopore-reads as reference.

ADD REPLYlink written 2.4 years ago by Buffo1.6k

Hi there, I'm new to the subject but I will soon be facing the same interrogations. I found only SPAdes and ALLPATHS-LG for the moment that does that.

With a better coverage, what would be the best approach ? Using a pipeline to assemble de novo with Nanopore and Illumina data or assembling the genome with Nanopore data and then correct with Illumina data ? or even complete the draft genome from Illumina with Nanopore data ?

Thank you very much,

ADD REPLYlink modified 14 months ago • written 14 months ago by lagartija60

Nanopore still lacking performance, the ratio cost/performance remains high. I think that PacBio is the best option for long reads and to complete fragmented assemblies (from illumina). Where you from lagartija? I know your name :).

ADD REPLYlink written 14 months ago by Buffo1.6k

Actually I already have the reads by Nanopore so I can't change that. By the way, do you know what's the difference between Spades and Spades-Hybrid ? It seems that both can do hybrid assembly...

So you know my name ? You meed lagartija or my real name ? haha I'm from France. But I'm also Argentinian and Norwegian. And you ? Italian ?

ADD REPLYlink written 13 months ago by lagartija60

No, is not the same, you can use 'trusted contigs' for de novo assemblies with spades, but not reads. On the other hand, spades hybrid can perform de novo assemblies from long and short reads :). I from the Congo but I live in America years ago, I know lagartijas XD.

ADD REPLYlink written 13 months ago by Buffo1.6k

AAAh I see. And how do I get the trusted contigs ? And both for Illumina and Nanopore ?

ADD REPLYlink written 13 months ago by lagartija60
1

You can use old assemblies as trusted contigs (from the same specie and closely related), the use of not highly related genomes are not recomended (in spades), if you dont have access to old assemblies (or it does not exist) de novo and hybrid assemblie is the unique option, and yes, You can use reads from nanopore and illumina for hybrid assemblies with spades-hybrid.

ADD REPLYlink written 13 months ago by Buffo1.6k

Only if you have them from some other source (e.g. an illumina only assembly).

ADD REPLYlink written 13 months ago by genomax68k

Because from what I see here Spades takes reads : http://spades.bioinf.spbau.ru/release3.10.1/manual.html

ADD REPLYlink written 13 months ago by lagartija60
5
gravatar for jblommaert92
2.4 years ago by
jblommaert9270
jblommaert9270 wrote:

Just thought I'd add the few options I've seen:

1) OPERA-LG https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0951-y

2) PacBio reccomendations may be relevant here https://github.com/PacificBiosciences/Bioinformatics-Training/wiki/Large-Genome-Assembly-with-PacBio-Long-Reads

3) LINKS https://gigascience.biomedcentral.com/articles/10.1186/s13742-015-0076-3

4) This workflow http://biorxiv.org/content/early/2016/05/22/054783

And this other question may be useful too Gap-filling and scaffolding using PacBio reads

DISCLAIMER: I haven't tried any of these yet, but I'm also planning nanopore-illumina hybrid assembly soon

ADD COMMENTlink written 2.4 years ago by jblommaert9270
1

Those are excellent suggestions!

I should have been looking for "scaffolding" rather than "hybrid assembly", which is probably more appropriate in my case.

ADD REPLYlink written 2.4 years ago by igor7.7k
2
gravatar for Carambakaracho
14 months ago by
Carambakaracho1.2k
Switzerland/Basel
Carambakaracho1.2k wrote:

Besides the almost obvious SPAdes I recommend looking into the MaSuRCA assembler. I had very good results for PacBio/Illumina and Nanopore/Illumina data, though my long read coverage was in all cases a little bit higher than what you describe.

BTW, SPAdes easily handles metagenome assemblies with way beyond 1 Gbps and with the latest version the error messages on memory consumption where improved and you'll find out pretty early whether it works or not. Give it a try, once it assembled, I'd even try the --careful option. It is rather depended on the available memory on your machine (should probably be 128GB or more) and the k-mer complexity of your genome than its taxonomic domain.

ADD COMMENTlink modified 4 months ago • written 14 months ago by Carambakaracho1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1960 users visited in the last hour