Question: help to do genome assembly
gravatar for
22 days ago by
walid.cbs0 wrote:

I prepare genomic shotgun libraries for sequencing using Single Molecule Real Time (SMRT) sequencing platform produced by PacBio (20X coverage of PacBio reads) coupled with sequencing on Illumina platform (30X coverage of Illumina Miseq 250bp reads).The obtained data will be used to perform a de novo hybrid assembly by combining both Illumina and PacBio reads to generate accurate draft assemblies of plant genome (with estimated size of 350 Mbp).

1- What types of assembler I should use?
2-A computer equipped with 320 RAM, 3TB hard disc and 2 x Xeon 2680v2 is it sufficient to analyze the results of sequencing and to make a hybrid assembly?

ADD COMMENTlink modified 20 days ago by h.mon12k • written 22 days ago by walid.cbs0

The best method to figure out question 1 is probably to search for papers which did something similar, there should be plenty. Combining long reads with short reads is typically called a 'hybrid assembly'.

ADD REPLYlink modified 22 days ago • written 22 days ago by WouterDeCoster26k

Thanks for your reply you recommend a paid software such as CLC assemble or free software as ALLPATHS-LG are satisfied to do this type of hybrid assembly

For my second question, do you have any idea about the characteristics requested in the workstation to made a hybrid assembly?

ADD REPLYlink written 22 days ago by walid.cbs0

Canu is one of the go-to long read assemblers. Googling for hybrid assembly quickly returned the following hybrid assembly pipeline:

I would guess that that workstation will be capable of running the assembly (provided you arent trying to do too many in parallel). Probably the thing that will impact you first is that 3 TB is not a lot of storage when it comes to large bioinformatics datasets. If you've just got short and long read data for a handful of genomes though I think you'll be fine.

ADD REPLYlink written 22 days ago by jrj.healey3.7k

There is definitely sufficient free software available which can handle this job. I'm not doing plant genomics, but I've seen quite some recent publications in which hybrid assembly is used for plants, combining short Illumina reads with long PacBio/Nanopore reads. There is no need to use commercial software.

ADD REPLYlink written 22 days ago by WouterDeCoster26k
gravatar for Rohit
20 days ago by
European union
Rohit1.3k wrote:

If you do not have an access to a cluster for error correcting PacBio with Illumina, DBG2OLC should do a great job at hybrid assembly with your data stats. Tune the parameters according to your data, high heterozygosity leads to fragmentation apart from repeat content in assembling illumina-only unitigs for DBG2OLC, minia is recommended. You can later perform genome polishing with Pilon or Racon

ADD COMMENTlink written 20 days ago by Rohit1.3k
gravatar for h.mon
20 days ago by
h.mon12k wrote:

Allpaths-LG generally ranks well in papers comparing assemblies (e.g. Assemblathon 2), and it can perform hybrid assemblies.

The hardware requirements for assemblies is partly due to amount and type of sequencing, partly due to the program used for assembly, and partly due to genome characteristics (genome size, complexity, ploidy, heterozigozity, repeat content, and so on). It seems to me the hardware you have can handle your plant genome, but only performing the assembly will tell.

ADD COMMENTlink written 20 days ago by h.mon12k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1321 users visited in the last hour