Question: Locations Of Plots Of Quantities Of Publicly Available Biological Data
6
gravatar for Gotgenes
9.9 years ago by
Gotgenes460
Bethesda, MD, USA
Gotgenes460 wrote:

There's a cliché in talks and presentations these days demonstrating the rapid (typically exponential, or super-exponential) growth of publicly available biological data of one nature or another (e.g., sequence data, yeast2hybrid, etc.). They're frequently juxtaposed against a plot of Moore's law. You know the type. You probably have even used or made such a plot if you're at this site.

It's not always obvious where to find these plots. Surprisingly (disappointingly, even), major clearing houses for biological data such as GenBank and Gene Expression Omnibus (GEO) don't provide plots of their growth in any obvious location, let alone their front pages (where it makes the most sense to display such positive trends). Let's compile a list of where to find these plots, including, but not limited to:

  • Publications (decent)
  • Open-access publications (good)
  • Sites that provide up-to-date plots (better)
  • Scripts or programs that generate plots on the fly (excellent)
visualization • 3.6k views
ADD COMMENTlink modified 9.1 years ago by Casey Bergman18k • written 9.9 years ago by Gotgenes460
3

I think it would also be interesting to post code that can generate these plots. The data are often available, although often not in the best format, for those who'd like to try a roll-your-own approach.

ADD REPLYlink written 9.9 years ago by Neilfws48k
2

Good to see you here!

ADD REPLYlink written 9.9 years ago by Paulo Nuin3.7k
6
gravatar for Mary
9.9 years ago by
Mary11k
Boston MA area
Mary11k wrote:

We started this the other day. See this thread: Exponentially Increasing Genomes Slide Another one I like that hasn't come up yet is the growth of GeneTests, disease for which testing is available: http://www.ncbi.nlm.nih.gov/projects/GeneTests/static/whatsnew/labdirgrowth.shtml

ADD COMMENTlink modified 12 months ago by RamRS30k • written 9.9 years ago by Mary11k
1

was about to write the same thing, you were 3 secs faster ;)

ADD REPLYlink written 9.9 years ago by Michael Schubert7.0k
1

Thanks. I failed in picking my search terms to look for an existing question. I don't know if we should close this question as a duplicate, as I'm interested in any type of (high-throughput) biological data.

ADD REPLYlink written 9.9 years ago by Gotgenes460

then you may want to refine your question in order to not be a duplicate ;)

ADD REPLYlink written 9.9 years ago by Michael Schubert7.0k
5
gravatar for Casey Bergman
9.9 years ago by
Casey Bergman18k
Athens, GA, USA
Casey Bergman18k wrote:

Data for the growth of the number of articles in MEDLINE can be found here:

http://www.nlm.nih.gov/bsd/licensee/baselinestats.html

There is some time lag in interpreting numbers from the MEDLINE baseline files. For example, good data on the growth of MEDLINE through 2008 can be found in the 2010 baseline statistics: http://www.nlm.nih.gov/bsd/licensee/2010_stats/2010_Totals.html

EDIT 1: Data for the growth of the number of GeneRIFs in Entrez Gene can be found here:

http://www.ncbi.nlm.nih.gov/projects/GeneRIF/stats/

EDIT 2: Data for the growth of the number of GWAS studies in the Human Genome Epidemiology database:

http://hugenavigator.net/HuGENavigator/startPageWatch.do

ADD COMMENTlink modified 9.7 years ago • written 9.9 years ago by Casey Bergman18k
5
gravatar for Khader Shameer
9.9 years ago by
Manhattan, NY
Khader Shameer18k wrote:

Already added sequence data growth in Uniprot in the other question, As you are interested in various data categories - here is the exponential growth of RCSB-PDB from 70's - till date. Kudos to RCSB-PDB team for providing the data and the graph in a convenient way.


EDIT by RamRS: Khader's link to his own answer is dead and does not point to a post on biostars.org because the post seems to have been lost before migration. Here is a link to an archived version of the post: https://web.archive.org/web/20111124051054/http://biostar.stackexchange.com/questions/2966/exponentially-increasing-genomes-slide/2973

Here is a picture of his answer:

alt text

ADD COMMENTlink modified 12 months ago by RamRS30k • written 9.9 years ago by Khader Shameer18k
4
gravatar for Michael Schubert
9.9 years ago by
Cambridge, UK
Michael Schubert7.0k wrote:

You might also want to take a look at this:

Björk B-C, Welling P, Laakso M, Majlender P, Hedlund T, et al. (2010) Open Access to the Scientific Journal Literature: Situation 2009. PLoS ONE 5(6): e11273

edit: there are some issues with the paper, see Lars' blogpost.

ADD COMMENTlink written 9.9 years ago by Michael Schubert7.0k
4
gravatar for Neilfws
9.9 years ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

Just a brief note on a way to generate "growth of database" data yourself, at least for the Entrez databases.

Most of the Bio* projects include an EUtils library. The BioRuby module has a useful method, esearch_count, which counts the number of results for a query. As an example, you could retrieve total publications in PubMed for years 2000-2010 like this:

#!/usr/bin/ruby
require "rubygems"
require "bio"

Bio::NCBI.default_email = "me@me.com"
ncbi = Bio::NCBI::REST.new

2000.upto(2010) do |year|
  all   = ncbi.esearch_count("#{year}[dp]", {"db" => "pubmed"})
  puts "#{year}\t#{all}"
end

Redirect the output to create a tab-delimited file with year + count. Here, we're searching the DP (date published) field in PubMed. You could substitute any Entrez database, search term(s) and years.

ADD COMMENTlink modified 12 months ago by RamRS30k • written 9.9 years ago by Neilfws48k
3
gravatar for Bio_X2Y
9.9 years ago by
Bio_X2Y3.8k
Ireland
Bio_X2Y3.8k wrote:

The Silva website plots the growth of ribosomal RNA databases. e.g. http://www.arb-silva.de/documentation/background/release-104/

ADD COMMENTlink modified 12 months ago by RamRS30k • written 9.9 years ago by Bio_X2Y3.8k
3
gravatar for Suk211
9.9 years ago by
Suk2111.0k
state college
Suk2111.0k wrote:

SCOP has listed out the statistics of it's release history in tabular form from last 12 years.

Scop Classification Statistics

I agree with Khader that PDB has done excellent job to report the statistics on it's entries. They have something called histogram menu which can easily generate statistics on current entries based on various criterion.

ex: Source Organism (Gene Source) Histogram

ADD COMMENTlink written 9.9 years ago by Suk2111.0k
3
gravatar for Gotgenes
9.9 years ago by
Gotgenes460
Bethesda, MD, USA
Gotgenes460 wrote:

There is a news article from October 2010 in Science that has a plot of the growth of human SNP data, particularly with regards to the 1000 Genomes project.

ADD COMMENTlink written 9.9 years ago by Gotgenes460

Bump! Not an OA article.

ADD REPLYlink written 9.9 years ago by Khader Shameer18k
3
gravatar for Rob
9.9 years ago by
Rob30
Rob30 wrote:

A recent paper with an updated "Growth of GEO" plot:

Le et al. Cross-species queries of large gene expression databases. Bioinformatics (2010) vol. 26 (19) pp. 2416-23

ADD COMMENTlink written 9.9 years ago by Rob30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1856 users visited in the last hour