Very new to bioinformatics.. shiny new.. :)

I know that there there is the NIH genbank, but for people like me who have done 23andme partial genome sequencing, and would like to use that data with some restrictions, to enhance drug discovery, while perhaps earning $$, are there any options?

What about information about DNA/RNA/genomes from bacteria, viruses, plants, animals etc? I am guessing because of fewer privacy concerns, these probably have been sequenced to a much greater extant...

I just participated in a genomic hackathon, and even though I am a MechE, I was blow away by the potential of this trying to find where I can find any repositories..

Efforts for monetization of information leads to restricted access, missed opportunities, stifling of science (while it may make a certain set of people money). Even if an individual had a "sellable" genome how much money do you think that person is going to be able to make in the long run. Take a look at Personal Genome Project to see if it changes your thinking.

Creation of GenBank and the basic tenet of freely accessible sequence data is perhaps one of the best examples of how legislation works for the great good. We have to thank late Senator Claude Pepper of Florida for this. This was later extended under Bermuda Principle for rapid and public release of genomic data.

Thanks for your reply. You obviously know a lot about opensharing of data. I can understand that data/research results/publications once produced and put on a server can be reproduced and spread at almost 0 cost (Genomic data because of its storage and transport cost maynot be as cheap as research results/publications (

I can see how its beneficial for the consumers of the *omic data sets/research results/publication to have access to these. However, I am wondering how this incentivizes the producers of these data/research results? The benefit as I can see it comes from perhaps recognition etc, but that’s not going to feed them for long. I have seen how expensive generating high quality research results/curating/producing data can be in terms of consumables and skilled labor, first hand.

To give an analogy, in an analytical chemistry lab, people use all kinds of reagents. It would be easy to claim that if these reagents would be available for free, then everybody could produce more research results, do more experiments. But who’s going to produce these reagents for free?

I can definitely see the role of government coming in and subsidizing the cost of producing these datasets to spur the initial snowball effect, as you pointed out. Without this initial investment from the government, this field would have a hard time getting started or not start at all. But over the long term, government subsidy cannot subsidize the cost of producing/sharing genomic data, it has to come from private individual’s/entities.

Again to use an analogy we can look at how startups/VCs interact. VCs come in in the initial phase of a startups life, where it has an idea but needs to be bankrolled to make it into a viable product. But VCs don’t support them indefinitely. At some point the products that the startups are producing need to stand by itself and the cost of producing the products need to be paid for by selling the products to private entities.

To use another analogy, I was definitely a proponent of the open-software movement, as it seemed to be me that it was amazing that I could get all these software’s for free. However, over the long term, I see that closed-source software has definitely gathered more usage, atleast in terms of the usage of software among non-coders/majority of the people in the world.

So I am curious to know why you think open data is better than closed/paid data sources?

Perhaps there is a source with more detailed discussion on this topic?

But over the long term, government subsidy cannot subsidize the cost of producing/sharing genomic data, it has to come from private individual’s/entities.

Research dollars have many economic benefits (besides the scientific value they generate). One could think of government as "subsidizing" the initial research but the downstream effects are many. In fact, government/academic institutions used to lose out on the actual economic benefits. There has been a significant change in thinking in this area over the last couple of decades. Now universities/faculty are actively encouraged to spin interesting technologies/compounds off into startups that have direct monetary benefits for them. Private foundations (e.g. Chan-Zukerberg) have been providing money for genomic research. Academic entities are not equipped for clinical testing/mass marketing of drugs/therapies and that is where pure economic interests step in and final costs for approved new drugs balloon.

I am sure there are closed source/fee for access genome databases out there. deCODE genetics has built one for a whole nation. But access to these types of resources is going to be limited. If you were a researcher looking to compare your cohort of genomes against a pool, your chances of being able to do so will be much greater with open data sources than closed. Then there is the legalities involved, which we have not considered at all. When you collect samples/genomic data from individuals they are asked to give permission to use that data (which are generally specific). Rarely would you get access to this data to use as you please and certainly not with "for profit" entities.

Bottom line is the real benefits are bound come from a large pool of genomes than a single one. A single genome may not have significant worth (not sure if you are thinking of buying selling at this level) but as a part of a pool it will have cumulative value.

There are a few 'grassroots' projects that share human genetic data such as the data from 23andMe with the public, but not free. If you want to share your data you can start there, but I don't think anyone will pay you. The minute money enters the equation you're in a world of law-induced pain where you have to follow strictly laid-out and enforced laws. Here's an older review on the laws in a few European countries, I don't think much has changed since the review came out. If you give your data away for free you're still in a somewhat legal grey zone (you automatically share the data of your ancestors and children too!).

For giving your data away for free, some projects exist:

There's Open Humans:
openSNP (I'm co-founder):

From time to time there are related competitions at Kaggle, here's a current one: is also running a machine learning challenge with openSNP data:

Manuel Corpas has been following all this, here's his blog:

If you want human data, has started a catalogue: (Manuel Corpas works for them)

The raw DNA and RNA data of the majority of the world's projects is deposited in the Sequence Read Archive, the SRA, which is mirrored in Europe (the ENA) and Japan (the DBBJ Sequence Read Archive). The interface is not straightforward to use (it's best if you use their ascp client to download data) but it's all there, for all species.

Thanks.. will look into this.. openSNP looks amazing!

