Question: Mongodb: What'S The Most Efficient Way To Store A Genomic Position
7
gravatar for Pierre Lindenbaum
3.6 years ago by
France
Pierre Lindenbaum58k wrote:

I want to store some genomic positions using MongoDB.

something like:

{
chrom:"chr2",
position:100,
name:"rs25"
}

I want to be able to quickly find all the records in a given segment. What would be the best key/_id to be used ?

a chrom , position object ?

db.snps.save({_id:{chrom:"chr2",position:100},name:"rs25"})

a padded string ?

db.snps.save({_id:"chr02:00000000100",chrom:"chr2",position:100,name:"rs25"})

an auto-generated id with an index on chrom and position ?

db.snps.save({chrom:"chr2",position:100,name:"rs25"})

other ?

???

thanks for your suggestion(s)

Pierre

PS: I cross-posted this question on stackoverflow http://stackoverflow.com/questions/3740112

ADD COMMENTlink written 3.6 years ago by Pierre Lindenbaum58k
1

I posted a benchmark on my blog: http://plindenbaum.blogspot.com/2010/09/indexing-some-genomic-positions-with.html

ADD REPLYlink written 3.6 years ago by Pierre Lindenbaum58k
4
gravatar for brentp
3.6 years ago by
brentp17k
Denver, Colorado
brentp17k wrote:

If you're going to be using mongodb to do "spatial" queries, have a look here. It's using a geohash for 2d indexes, but you can likely shoe-horn your 1d data into it. Then you'd be able to take advantage of their spatial queries like nearest and within bounds.

Another option is to hash your 1-d intervals yourself--like you do with padded string. intuitively, that must have the best locality in the B-Tree. I suspect with your options above, you'd have to run a benchmark to see if there were any noticeable differences.

Some time ago, I wrote biohash/interval-hash that would work on 1d intervals as geohash does on 2d points, it's not fully thought out, but could be a decent starting point.

ADD COMMENTlink written 3.6 years ago by brentp17k

thanks, I'm going to validate this interesting answer.

ADD REPLYlink written 3.6 years ago by Pierre Lindenbaum58k
Please log in to add an answer.

Help
Access
  • RSS
  • Stats
  • API

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.0.0
Traffic: 619 users visited in the last hour