Question

help to do genome assembly

0

Entering edit mode

7.4 years ago

walid.cbs • 0

I prepare genomic shotgun libraries for sequencing using Single Molecule Real Time (SMRT) sequencing platform produced by PacBio (20X coverage of PacBio reads) coupled with sequencing on Illumina platform (30X coverage of Illumina Miseq 250bp reads).The obtained data will be used to perform a de novo hybrid assembly by combining both Illumina and PacBio reads to generate accurate draft assemblies of plant genome (with estimated size of 350 Mbp).

1- What types of assembler I should use?
2-A computer equipped with 320 RAM, 3TB hard disc and 2 x Xeon 2680v2 is it sufficient to analyze the results of sequencing and to make a hybrid assembly?

Assembly next-gen sequencing genome • 2.8k views

ADD COMMENT • link updated 7.4 years ago by h.mon 35k • written 7.4 years ago by walid.cbs • 0

2

Entering edit mode

The best method to figure out question 1 is probably to search for papers which did something similar, there should be plenty. Combining long reads with short reads is typically called a 'hybrid assembly'.

ADD REPLY • link 7.4 years ago by WouterDeCoster 48k

0

Entering edit mode

Thanks for your reply you recommend a paid software such as CLC assemble or free software as ALLPATHS-LG are satisfied to do this type of hybrid assembly

For my second question, do you have any idea about the characteristics requested in the workstation to made a hybrid assembly?

ADD REPLY • link 7.4 years ago by walid.cbs • 0

2

Entering edit mode

Canu is one of the go-to long read assemblers. Googling for hybrid assembly quickly returned the following hybrid assembly pipeline:

https://nanoporetech.com/resource-centre/publications/hybrid-assembly-pipeline-released-using-canu-racon-and-pilon

I would guess that that workstation will be capable of running the assembly (provided you arent trying to do too many in parallel). Probably the thing that will impact you first is that 3 TB is not a lot of storage when it comes to large bioinformatics datasets. If you've just got short and long read data for a handful of genomes though I think you'll be fine.

ADD REPLY • link 7.4 years ago by Joe 22k

1

Entering edit mode

There is definitely sufficient free software available which can handle this job. I'm not doing plant genomics, but I've seen quite some recent publications in which hybrid assembly is used for plants, combining short Illumina reads with long PacBio/Nanopore reads. There is no need to use commercial software.

ADD REPLY • link 7.4 years ago by WouterDeCoster 48k

score 1 · Answer 1 · 2018-02-26

If you do not have an access to a cluster for error correcting PacBio with Illumina, DBG2OLC should do a great job at hybrid assembly with your data stats. Tune the parameters according to your data, high heterozygosity leads to fragmentation apart from repeat content in assembling illumina-only unitigs for DBG2OLC, minia is recommended. You can later perform genome polishing with Pilon or Racon

score 1 · Answer 2 · 2018-02-26

Allpaths-LG generally ranks well in papers comparing assemblies (e.g. Assemblathon 2), and it can perform hybrid assemblies.

The hardware requirements for assemblies is partly due to amount and type of sequencing, partly due to the program used for assembly, and partly due to genome characteristics (genome size, complexity, ploidy, heterozigozity, repeat content, and so on). It seems to me the hardware you have can handle your plant genome, but only performing the assembly will tell.