Question

Genome data on hadoop chromosome level

0

Entering edit mode

9.3 years ago

shalini.ravishankar • 0

Hello Everyone,

I am an IT student doing some work on hadoop in human genome project. My first trouble is how do I store the genome data in hadoop cluster? How do I store data Chromosome wise?

We do have cluster of 30 machines with hadoop. The problem is we are planning to process the human genome project using hadoop. Here the data is in the form of BAM files. I know if I load the data to hdfs, it will automatically split it into chunks and store on the name nodes. That is the problem here. I couldn't split the data like that. Need to split the data chromosome wise so that we can perform bio algorithm computing on them.

Bio algorithm computing : for instance bisulfite methylation extraction.

Currently we use bismap ( python tool ). Is there a way to store the data chromosome wise on hadoop.and run the bismap tool command as map reduce jobs

nga hadoop genome chromosome • 2.7k views

ADD COMMENT • link updated 2.1 years ago by Ram 43k • written 9.3 years ago by shalini.ravishankar • 0

2

Entering edit mode

Hello shalini.ravishankar!

It appears that your post has been cross-posted to another site: SEQanswers.

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY • link updated 2.1 years ago by Ram 43k • written 9.3 years ago by Devon Ryan 104k

0

Entering edit mode

Thanks Alex. Sure I will take a look in to that.

ADD REPLY • link 9.3 years ago by shalini.ravishankar • 0

0

Entering edit mode

cross-posted on SO: http://stackoverflow.com/questions/27958594/hadoop-for-human-genome-data

ADD REPLY • link 9.3 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

I am sorry for the inconvenience. I will delete the other threads.

ADD REPLY • link 9.3 years ago by shalini.ravishankar • 0

0

Entering edit mode

Cross posted on Quora http://qr.ae/6EsvP

ADD REPLY • link updated 2.1 years ago by Ram 43k • written 9.3 years ago by Matt Shirley 10k

Ram · Answer 1 · 2015-01-15

0

Entering edit mode

9.3 years ago

Alex Reynolds 35k

You might start with Michael Hoffman's Genomedata paper, code and documentation.

ADD COMMENT • link updated 2.1 years ago by Ram 43k • written 9.3 years ago by Alex Reynolds 35k