Question: help to do genome assembly
gravatar for
19 months ago by
walid.cbs0 wrote:

I prepare genomic shotgun libraries for sequencing using Single Molecule Real Time (SMRT) sequencing platform produced by PacBio (20X coverage of PacBio reads) coupled with sequencing on Illumina platform (30X coverage of Illumina Miseq 250bp reads).The obtained data will be used to perform a de novo hybrid assembly by combining both Illumina and PacBio reads to generate accurate draft assemblies of plant genome (with estimated size of 350 Mbp).

1- What types of assembler I should use?
2-A computer equipped with 320 RAM, 3TB hard disc and 2 x Xeon 2680v2 is it sufficient to analyze the results of sequencing and to make a hybrid assembly?

ADD COMMENTlink modified 18 months ago by h.mon27k • written 19 months ago by walid.cbs0

The best method to figure out question 1 is probably to search for papers which did something similar, there should be plenty. Combining long reads with short reads is typically called a 'hybrid assembly'.

ADD REPLYlink modified 19 months ago • written 19 months ago by WouterDeCoster40k

Thanks for your reply you recommend a paid software such as CLC assemble or free software as ALLPATHS-LG are satisfied to do this type of hybrid assembly

For my second question, do you have any idea about the characteristics requested in the workstation to made a hybrid assembly?

ADD REPLYlink written 19 months ago by walid.cbs0

Canu is one of the go-to long read assemblers. Googling for hybrid assembly quickly returned the following hybrid assembly pipeline:

I would guess that that workstation will be capable of running the assembly (provided you arent trying to do too many in parallel). Probably the thing that will impact you first is that 3 TB is not a lot of storage when it comes to large bioinformatics datasets. If you've just got short and long read data for a handful of genomes though I think you'll be fine.

ADD REPLYlink written 19 months ago by Joe14k

There is definitely sufficient free software available which can handle this job. I'm not doing plant genomics, but I've seen quite some recent publications in which hybrid assembly is used for plants, combining short Illumina reads with long PacBio/Nanopore reads. There is no need to use commercial software.

ADD REPLYlink written 19 months ago by WouterDeCoster40k
gravatar for Rohit
18 months ago by
Rohit1.4k wrote:

If you do not have an access to a cluster for error correcting PacBio with Illumina, DBG2OLC should do a great job at hybrid assembly with your data stats. Tune the parameters according to your data, high heterozygosity leads to fragmentation apart from repeat content in assembling illumina-only unitigs for DBG2OLC, minia is recommended. You can later perform genome polishing with Pilon or Racon

ADD COMMENTlink written 18 months ago by Rohit1.4k
gravatar for h.mon
18 months ago by
h.mon27k wrote:

Allpaths-LG generally ranks well in papers comparing assemblies (e.g. Assemblathon 2), and it can perform hybrid assemblies.

The hardware requirements for assemblies is partly due to amount and type of sequencing, partly due to the program used for assembly, and partly due to genome characteristics (genome size, complexity, ploidy, heterozigozity, repeat content, and so on). It seems to me the hardware you have can handle your plant genome, but only performing the assembly will tell.

ADD COMMENTlink written 18 months ago by h.mon27k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1521 users visited in the last hour