Question: Mongodb: What'S The Most Efficient Way To Store A Genomic Position
gravatar for Pierre Lindenbaum
7.1 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum99k wrote:

I want to store some genomic positions using MongoDB.

something like:


I want to be able to quickly find all the records in a given segment. What would be the best key/_id to be used ?

a chrom , position object ?{_id:{chrom:"chr2",position:100},name:"rs25"})

a padded string ?{_id:"chr02:00000000100",chrom:"chr2",position:100,name:"rs25"})

an auto-generated id with an index on chrom and position ?{chrom:"chr2",position:100,name:"rs25"})

other ?


thanks for your suggestion(s)


PS: I cross-posted this question on stackoverflow

database position index • 3.2k views
ADD COMMENTlink written 7.1 years ago by Pierre Lindenbaum99k

I posted a benchmark on my blog:

ADD REPLYlink written 7.1 years ago by Pierre Lindenbaum99k
gravatar for brentp
7.1 years ago by
Salt Lake City, UT
brentp22k wrote:

If you're going to be using mongodb to do "spatial" queries, have a look here. It's using a geohash for 2d indexes, but you can likely shoe-horn your 1d data into it. Then you'd be able to take advantage of their spatial queries like nearest and within bounds.

Another option is to hash your 1-d intervals yourself--like you do with padded string. intuitively, that must have the best locality in the B-Tree. I suspect with your options above, you'd have to run a benchmark to see if there were any noticeable differences.

Some time ago, I wrote biohash/interval-hash that would work on 1d intervals as geohash does on 2d points, it's not fully thought out, but could be a decent starting point.

ADD COMMENTlink written 7.1 years ago by brentp22k

thanks, I'm going to validate this interesting answer.

ADD REPLYlink written 7.1 years ago by Pierre Lindenbaum99k

Hi Pierre, I couldn't find geohash in your benchmark. Did you get to compare it?

ADD REPLYlink written 22 months ago by Eliad40
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1397 users visited in the last hour