Hi,
I am looking for a suite of benchmark codes/scripts that take 1000 genomes datasets (e.g., those published on http://aws.amazon.com/1000genomes s3://1000genomes) as input, and do some non-trivial work (queries). Since it is for benchmark-ing new parallel algorithms purpose, I do not care what exact work it performs, as long as it is 1) compute-intensive (from computer scientist perspective) and 2) meaningful (from biologist perspective). The benchmark I am looking for is more like LinPACK or Terasort. For the sake of simplicity, it can use or chain off-the-shelf tools like samtools, vcftool. Can anyone point me (with little knowledge about DNA) to a right direction?
Thanks in advance