Entering edit mode
9.2 years ago
William ★ 5.2k
- "March 20, 2014 | The Broad Institute has announced that a new version of the Genome Analysis Toolkit (GATK), version 3.1, has been released and has been optimized for Intel Advanced Vector Extensions (Intel AVX) found in Intel Xeon-based servers. The improvements account for faster variant calling, achieving three to five times overall improvement in variant discovery, enabling a whole genome to be analyzed in one day rather than three."
- "For example, Intel suggested a way to rethink how duplicates are marked in the BAM file. “It was an amazingly insightful observation,” Banks says. “It was a big conceptual improvement to get that done.”
- “Instead of having to take 26,000 samples of sequencing data, stick them all into memory at the same time, and try to find SNPs for all of them, you can do it one sample at a time… It’s very cheap, and it means the data doesn’t have to all be together in the same file system. It’s technically much simpler and computationally much faster. At the end there’s a joint genotyping step, which is very cheap because it’s done in the… VCF.”
Wow, some big improvements in the GATK 3.x versions. See the whole article here:
Great! It seems that intel is going through some of the standard tools and trying to make them faster (I know they've at least looked at bowtie2 as well) :)