Question: Mongodb: What'S The Most Efficient Way To Store A Genomic Position
gravatar for Pierre Lindenbaum
5.2 years ago by
Pierre Lindenbaum77k wrote:

I want to store some genomic positions using MongoDB.

something like:


I want to be able to quickly find all the records in a given segment. What would be the best key/_id to be used ?

a chrom , position object ?{_id:{chrom:"chr2",position:100},name:"rs25"})

a padded string ?{_id:"chr02:00000000100",chrom:"chr2",position:100,name:"rs25"})

an auto-generated id with an index on chrom and position ?{chrom:"chr2",position:100,name:"rs25"})

other ?


thanks for your suggestion(s)


PS: I cross-posted this question on stackoverflow

database position index • 2.1k views
ADD COMMENTlink written 5.2 years ago by Pierre Lindenbaum77k

I posted a benchmark on my blog:

ADD REPLYlink written 5.2 years ago by Pierre Lindenbaum77k
gravatar for brentp
5.2 years ago by
Salt Lake City, UT
brentp20k wrote:

If you're going to be using mongodb to do "spatial" queries, have a look here. It's using a geohash for 2d indexes, but you can likely shoe-horn your 1d data into it. Then you'd be able to take advantage of their spatial queries like nearest and within bounds.

Another option is to hash your 1-d intervals yourself--like you do with padded string. intuitively, that must have the best locality in the B-Tree. I suspect with your options above, you'd have to run a benchmark to see if there were any noticeable differences.

Some time ago, I wrote biohash/interval-hash that would work on 1d intervals as geohash does on 2d points, it's not fully thought out, but could be a decent starting point.

ADD COMMENTlink written 5.2 years ago by brentp20k

thanks, I'm going to validate this interesting answer.

ADD REPLYlink written 5.2 years ago by Pierre Lindenbaum77k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 361 users visited in the last hour