News:First near-complete assembly of the bread wheat genome, *Triticum aestivum* - check data/compute resources used
0
1
Entering edit mode
6.5 years ago
GenoMax 141k

First near-complete genome assembly of the hexaploid bread wheat genome was just published.

Hybrid data source (Illumina and PacBio). The final assembly contains 15,344,693,583 bases and has a weighted average (N50) contig size of 232,659 bases.

Data:

The first data set consisted of 7.06 billion Illumina reads containing approximately 1 trillion bases of DNA. The Illumina reads were 150-bp, paired reads from short DNA fragments, averaging 400 bp in length. Using an estimated genome size of 15.3 Gbp, this represented 65-fold coverage of the genome. The second data set used Pacific Biosciences single-molecule (SMRT) technology to generate 55.5 million reads with an average read length just under 10,000 bp, containing a total of 545 billion bases of DNA, representing 36-fold coverage of the genome.

Data analysis is similarly spectular (and here is only a part of it). Assemblers used MaSuRCA, Celera Assembler and FALCON

The total CPU time was ~470,000 CPU hours (53.7 years), which was only made feasible by running it on a grid with thousands of jobs running in parallel (the maximum number was 3,320) for some of the major steps. The total elapsed time was just over 5 months.

When combined with the earlier steps, the entire assembly process took 6.5 months!

wheat genome • 1.3k views
ADD COMMENT

Login before adding your answer.

Traffic: 2934 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6