Question: A Question About Hybrid Assembly
gravatar for Lhl
7.7 years ago by
United States
Lhl730 wrote:

Hi there,

I have some sequence (reads) data produced by both 454 and illumina technologies. And i assembled the 454 reads using Newbler, and illumina reads using Velvet respectively.

Now i want to combine all the data and do an more complete assembly.

Does anyone know how to do a hybrid assembly combining both data types (either start from assembling reads or assembling contigs) ?

Thanks a lot!

PS: In fact i have been thinking of combining both 454 and illumina contigs and remove the redundancy based on identity; or reciprocally blast the two sets of contigs agains each other to identify orthologous contigs and remove redundant contigs. However, i am not sure these are good strategies, i would love to know other options.

assembly illumina • 7.7k views
ADD COMMENTlink modified 7.6 years ago by Benm710 • written 7.7 years ago by Lhl730
gravatar for Benm
7.7 years ago by
Benm710 wrote:

There are few of assembly tools can handle the question of hybrid assembly of NGS data, it is also my puzzle, because the sequence type and error type are different, here is a great review to introduce the difference: Michael L. Metzker, Sequencing technologies — the next generation. Nat Rev Genet. 2010 Jan;11(1):31-46. Epub 2009 Dec 8. Review.

Maybe some software do it well, although they are not suitable for me: MIRA - Whole Genome Shotgun and EST Sequence Assembler for Sanger, 454 and Solexa / Illumina

ABySS - Assembly By Short Sequences - A de novo sequence assembler, "ABySS is a de novo sequence assembler that is designed for very short reads.

ALLPATHS - a whole genome shotgun assembler that can generate high quality genome assemblies using short reads such as those produced by the new generation of sequencers.

ALLPATHS-LG - a update version of ALLPATHS. It works on both small and large (mammalian size) genomes.

Or you can assemble them using Newbler for 454 reads and Velvet for Illumina reads as you did, then use PHRAP(Phrap is a program for assembling shotgun DNA sequence data, suitable for sanger and 454 reads, it used overlap-layout-consensus algorithm), CPA3/PCAP (CAP3 is for small-scale assembly of EST sequences with or without quality values;PCAP is for large-scale assembly of genomic sequences with quality values and with or without forward-reverse read pairs) and Euler(Euler is a new approach to fragment assembly that abandons the classical "overlap - layout - consensus" paradigm that is used in all currently available assembly tools.) to combine the contigs, if you wan to construct scaffolds, you can try SSPACE(Tools for scaffolding pre-assembled contigs), PE-Assembler(PE-Assembler: de novo assembler using short paired-end reads) etc.

For this question, I also have an article recommend to you: Miller JR, Koren S, Sutton G. Assembly algorithms for next-generation sequencing data. Genomics. 2010 Jun;95(6):315-27. Epub 2010 Mar 6.

ADD COMMENTlink modified 7.3 years ago by Michael Kuhn5.0k • written 7.7 years ago by Benm710

Thanks a lot for your detailed answer!

ADD REPLYlink written 7.7 years ago by Lhl730

Euler is just an early de Bruijn assembler, in principle, it is the same as ALLPATHS, Velvet, and Abyss.

ADD REPLYlink written 5.3 years ago by Ketil3.9k
gravatar for Martin A Hansen
7.7 years ago by
Martin A Hansen3.0k
Martin A Hansen3.0k wrote:

The problem is that the 454 and Illumina platforms yield different types of data each with inherent problems. No hybrid assembler exists that takes into account the different error types that comes with the respective data types. I suggest to shred your contigs from Velvet into artificial 454 reads and feed these to Newbler along with the original 454 data.

I have successfully used this approach before and written a Biopieces for the shredding:

It is of cause also possible to shred Newbler contigs and feed to Velvet - I havn't tried this since my gut feeling tells me that Newbler will do the best job. (Also shred_seq currently don't produces paired-end reads -> noone requested this).

Alternatively, IDBA is supposed to be able to do hybrid assembles, but in my hands it always segfaults.

Also, Ray is supposed to be able to do hybrid assemblies - but more interestingly - also scaffolding. I havn't tested this, but I think you can feed Newbler contigs and Illumina reads to Ray.

ADD COMMENTlink written 7.7 years ago by Martin A Hansen3.0k

Thanks for your suggestion, which is very practical! I will give it a go!

ADD REPLYlink written 7.7 years ago by Lhl730

We did something like this for the fire ant genome: assembled +scaffolded Illumina with SOAPdenovo; "shredded" the scaffolds into overlapping 300bp sequences, then provided these as FASTA to Newbler 454 along with our true 454 reads. or

ADD REPLYlink written 7.5 years ago by Yannick Wurm2.3k
gravatar for Ketil
7.7 years ago by
Ketil3.9k wrote:

FWIW, I've gotten best result using Newbler on 454 data, and then using SSPACE to build scaffolds from Illumina reads. I measure quality by aligning the illumina reads to the result, and counting the fraction of matched reads and mathced pairs, and also by counting EST matches and a handful of fosmid ends.

Runner-up method is CLC, which works decently on Illumina, but seems to be inferior to Newbler on 454 data. Celera also seems to be inferior to Newbler in my attempts at using it, and doesn't deal well with the amounts of Illumina data. I've not been able to get anything useful out of Velvet or SOAPdenovo, in spite of frequent praise and successful projects.

Quite likely, the optimal strategy depends on the types and amounts of data, and the characteristics of the genome you're trying to assemlble.

Bottom line is, try a variety of software, and make sure you measure the quality with whatever means you have - and that means something beyond N50.

ADD COMMENTlink written 7.7 years ago by Ketil3.9k

SSPACE does improve the assembly. Thanks a lot Ketil .

ADD REPLYlink written 7.6 years ago by Lhl730
gravatar for 2184687-1231-83-
7.7 years ago by
2184687-1231-83-4.9k wrote:

You could try assembling with 454, then adding the Illumina reads on top, then closing the gaps with something like IMAGE:

ADD COMMENTlink written 7.7 years ago by 2184687-1231-83-4.9k

Thanks avilella, this is in fact a good suggestion. But it is a pity that in my case i got my draft genome using illumina and the assembly of 454 data produced much fewer contigs than the illumina counterpart.

ADD REPLYlink written 7.7 years ago by Lhl730
gravatar for Marina Manrique
7.7 years ago by
Marina Manrique1.3k
Marina Manrique1.3k wrote:

For a hybrid assembly I'd do a de novo assembly with MIRA (

I think that's commonly used in hybrid assemblies but I'm not completely sure

ADD COMMENTlink written 7.7 years ago by Marina Manrique1.3k

MIRA currently struggles with hybrid assemblies with lots of Illumina data. One might try with a limited number of reads (and lots of memory!).

ADD REPLYlink written 7.7 years ago by Martin A Hansen3.0k

Sure! With MIRA definitely use as much memory as you can!

ADD REPLYlink written 7.7 years ago by Marina Manrique1.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 688 users visited in the last hour