Has anyone used the 1000 genomes public data set available on Amazon s3?
Or, I should ask- has anyone used the BAM files directly via an AWS service such as elastic mapreduce?
I can download the files to EBS, unpack them, and reupload them to s3 but that is more expensive (and more work) than the public/free copy.
Thank you for any insight, Justin
Edit: I am currently looking into hadoop-bam http://sourceforge.net/projects/hadoop-bam/