Hello all :)
There are many biology-related graph databases out there, which makes a lot of sense - Biological problems are a good fit for graphs. Unfortunately, many of the graph databases i've come across have not been particularly up to date (a PhD project long forgotten), suffer from having seriously overly complicated schema, and/or are just simply not downloadable (you can query their graph via the website, but not download and use the database locally to issue complex queries)
I believe this combination of problems has led to graph databases not seeing as much use as they probably should get as an generic format for biological data. It also doesn't help that there are few good in-browser visualisation tools to graphically query graphs without learning gremlin or Cypher syntax (SQL for graphs) which is what a lot of the biologists and already-overloaded Bioinformaticans would probably appreciate.
1. Rebuild popular SQL databases like UCSC's table browser into a MUCH simpler graph database. Allow users to download the full graph, or sub-graphs, with an easy query form.
2. Create a generic CSV parser/importer for genomic data. It would roll out the columns with multiple values per cell (exons,VCF) understand biological formats (chr:start-end formats), help you design a schema for the resultant graph, and finally output a graph database.
The former option would be simple, but laborious. It would also need constant updating to keep it in-sync with UCSC/etc, and as soon as their SQL schema changes, everything breaks. :P
The latter would be more challenging to make, more complicated to use, but provides the most flexibility and compatibility down the road.
Before I set down the road to building out one or both of these paths, does anyone know if either options already exist?
Is there an easier way I have overlooked?
Would you like to help me remake the UCSC database (or parts of it) as a graph database?
Thank you all in advance! :D