Question: How To Access Bam Files Directly Via Hadoop?
0
gravatar for jtal04
6.8 years ago by
jtal040
jtal040 wrote:

Has anyone used the 1000 genomes public data set available on Amazon s3?

Or, I should ask- has anyone used the BAM files directly via an AWS service such as elastic mapreduce?

I can download the files to EBS, unpack them, and reupload them to s3 but that is more expensive (and more work) than the public/free copy.

Thank you for any insight, Justin

Edit: I am currently looking into hadoop-bam http://sourceforge.net/projects/hadoop-bam/

1000genomes bam • 2.2k views
ADD COMMENTlink modified 6.8 years ago by Mikael Huss4.6k • written 6.8 years ago by jtal040
1
gravatar for Mikael Huss
6.8 years ago by
Mikael Huss4.6k
Stockholm
Mikael Huss4.6k wrote:

You already answered your own question - you could use Hadoop-BAM. You might also want to check out SeqPig, which lets you perform Pig queries against your BAM files (and other things).

ADD COMMENTlink written 6.8 years ago by Mikael Huss4.6k
0
gravatar for JC
6.8 years ago by
JC7.9k
Mexico
JC7.9k wrote:

did you read and try the tutorial? http://www.1000genomes.org/using-1000-genomes-data-amazon-web-service-cloud

ADD COMMENTlink written 6.8 years ago by JC7.9k

Yes. It says to access the data the generic way you access s3 data. Next it describes how to start up an ec2 image from their AMI and the remaining is a tutorial that is not specific to AWS. I guess the title of my question is misleading- my real problem is trying to use bam files in elastic mapreduce, which doesnt know how to split records in a bam file.

ADD REPLYlink written 6.8 years ago by jtal040
1

oh I got it, please edit your post, maybe someone here knows the answer.

ADD REPLYlink written 6.8 years ago by JC7.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1983 users visited in the last hour