Question: Mongodb: What'S The Most Efficient Way To Store A Genomic Position
gravatar for Pierre Lindenbaum
6.0 years ago by
Pierre Lindenbaum85k wrote:

I want to store some genomic positions using MongoDB.

something like:


I want to be able to quickly find all the records in a given segment. What would be the best key/_id to be used ?

a chrom , position object ?{_id:{chrom:"chr2",position:100},name:"rs25"})

a padded string ?{_id:"chr02:00000000100",chrom:"chr2",position:100,name:"rs25"})

an auto-generated id with an index on chrom and position ?{chrom:"chr2",position:100,name:"rs25"})

other ?


thanks for your suggestion(s)


PS: I cross-posted this question on stackoverflow

database position index • 2.6k views
ADD COMMENTlink written 6.0 years ago by Pierre Lindenbaum85k

I posted a benchmark on my blog:

ADD REPLYlink written 6.0 years ago by Pierre Lindenbaum85k
gravatar for brentp
6.0 years ago by
Salt Lake City, UT
brentp21k wrote:

If you're going to be using mongodb to do "spatial" queries, have a look here. It's using a geohash for 2d indexes, but you can likely shoe-horn your 1d data into it. Then you'd be able to take advantage of their spatial queries like nearest and within bounds.

Another option is to hash your 1-d intervals yourself--like you do with padded string. intuitively, that must have the best locality in the B-Tree. I suspect with your options above, you'd have to run a benchmark to see if there were any noticeable differences.

Some time ago, I wrote biohash/interval-hash that would work on 1d intervals as geohash does on 2d points, it's not fully thought out, but could be a decent starting point.

ADD COMMENTlink written 6.0 years ago by brentp21k

thanks, I'm going to validate this interesting answer.

ADD REPLYlink written 6.0 years ago by Pierre Lindenbaum85k

Hi Pierre, I couldn't find geohash in your benchmark. Did you get to compare it?

ADD REPLYlink written 9 months ago by Eliad10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1384 users visited in the last hour