Question

News:First near-complete assembly of the bread wheat genome, *Triticum aestivum* - check data/compute resources used

1

Entering edit mode

6.5 years ago

GenoMax 141k

First near-complete genome assembly of the hexaploid bread wheat genome was just published.

Hybrid data source (Illumina and PacBio). The final assembly contains 15,344,693,583 bases and has a weighted average (N50) contig size of 232,659 bases.

Data:

The first data set consisted of 7.06 billion Illumina reads containing approximately 1 trillion bases of DNA. The Illumina reads were 150-bp, paired reads from short DNA fragments, averaging 400 bp in length. Using an estimated genome size of 15.3 Gbp, this represented 65-fold coverage of the genome. The second data set used Pacific Biosciences single-molecule (SMRT) technology to generate 55.5 million reads with an average read length just under 10,000 bp, containing a total of 545 billion bases of DNA, representing 36-fold coverage of the genome.

Data analysis is similarly spectular (and here is only a part of it). Assemblers used MaSuRCA, Celera Assembler and FALCON

The total CPU time was ~470,000 CPU hours (53.7 years), which was only made feasible by running it on a grid with thousands of jobs running in parallel (the maximum number was 3,320) for some of the major steps. The total elapsed time was just over 5 months.

When combined with the earlier steps, the entire assembly process took 6.5 months!

wheat genome • 1.3k views

ADD COMMENT • link updated 11 months ago by Ram 43k • written 6.5 years ago by GenoMax 141k