Question: Mongodb: What'S The Most Efficient Way To Store A Genomic Position
gravatar for Pierre Lindenbaum
5.7 years ago by
Pierre Lindenbaum82k wrote:

I want to store some genomic positions using MongoDB.

something like:


I want to be able to quickly find all the records in a given segment. What would be the best key/_id to be used ?

a chrom , position object ?{_id:{chrom:"chr2",position:100},name:"rs25"})

a padded string ?{_id:"chr02:00000000100",chrom:"chr2",position:100,name:"rs25"})

an auto-generated id with an index on chrom and position ?{chrom:"chr2",position:100,name:"rs25"})

other ?


thanks for your suggestion(s)


PS: I cross-posted this question on stackoverflow

database position index • 2.4k views
ADD COMMENTlink written 5.7 years ago by Pierre Lindenbaum82k

I posted a benchmark on my blog:

ADD REPLYlink written 5.7 years ago by Pierre Lindenbaum82k
gravatar for brentp
5.7 years ago by
Salt Lake City, UT
brentp21k wrote:

If you're going to be using mongodb to do "spatial" queries, have a look here. It's using a geohash for 2d indexes, but you can likely shoe-horn your 1d data into it. Then you'd be able to take advantage of their spatial queries like nearest and within bounds.

Another option is to hash your 1-d intervals yourself--like you do with padded string. intuitively, that must have the best locality in the B-Tree. I suspect with your options above, you'd have to run a benchmark to see if there were any noticeable differences.

Some time ago, I wrote biohash/interval-hash that would work on 1d intervals as geohash does on 2d points, it's not fully thought out, but could be a decent starting point.

ADD COMMENTlink written 5.7 years ago by brentp21k

thanks, I'm going to validate this interesting answer.

ADD REPLYlink written 5.7 years ago by Pierre Lindenbaum82k

Hi Pierre, I couldn't find geohash in your benchmark. Did you get to compare it?

ADD REPLYlink written 5 months ago by Eliad10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 403 users visited in the last hour