Question: Exponentially increasing genomes slide
 
13
 
 

I always see a slide in talks what shows an increasing number of genomes available in GenBank or other database. Where is this slide from? I have seen an outdated one from Genomes Online but nothing recent.

How can I find this graph and cite it for my own talk?

 
 
 

I guewss the genomes online one is the best answer. Thank you for the boost on my question giovanni.

Maybe a better question would be, where are these data so that we can generate our own pretty graphs? But then again, I realize that the data are out there--you just have to find them and bring them together yourself!

Although, everyone gave really great answers and I learned a lot from going through your links and what you said. Thank you all!

log in to reply • written 19 months ago by Lee Katz  140110

6 answers

 
10
 
 
 

There was another really good graphic that Lincoln Stein used in his talk at Beyond The Genome last week. It is available from this paper:

The case for cloud computing in genome informatics

It is figure 2 in there. It shows the slope of sequence data pre-NGS, and the change recently. And also the point where we have now crossed storage vs production: we have now passed the point where we can afford to store it:

"The cost of genome sequencing is now decreasing several times faster than the cost of storage, promising that at some time in the not too distant future it will cost less to sequence a base of DNA than to store it on a hard disk....The various members of the genome informatics ecosystem are now facing a potential tsunami of genome data that will swamp our storage systems and crush our compute clusters."

Also at this meeting people were trying to change the meme from big scary data (deluge, tsunami, etc) to "data bonanza". People were attempting to use that--but they still seemed scared :)

 
 
 

I like the information added within this response very much.

log in to reply • written 19 months ago by Larry_Parnell  1320722
 

lol, I wasn't there, but you can count me with the scared ones. I do have ideas, and a plan, for dealing with a certain amount of data growth. But if this keeps going indefinitely, where will we end up? That's what I'm afraid of. Is Pac Bio going to save me from short reads? Or are they just going to multiply the data volume? Or both at the same time, plus a continuing flood of 2nd-gen data?

Or more generally - what is the new equilibrium going to look like, and when are we going to get there? The fact that I don't know is what makes me nervous.

log in to reply • written 19 months ago by Mitch Skinner  5015
 

I'm denied access to the article from a University of California :(

log in to reply • written 14 months ago by Aleksandr Levchuk  252213
 
 
8
 
 
 

You might want to have a look at the statistics from GOLD the 'Genomes OnLine Database' here as this has statistics at the genome, not basepair level.

 
 
 

I just realized that they actually have the data in an Excel spreadsheet at the top of the page which is what I wanted. http://genomesonline.org/Gold_Stats.xls

log in to reply • written 19 months ago by Lee Katz  140110
 
 
11
 
 
 
 
 

what a pity the graph is so damn ugly!

log in to reply • written 19 months ago by Yannick Wurm  141310
 

I had to use internet explorer to get the numbers, but it's suggesting the relative growth rate is decreasing, and that 2000 was an outlier year (and obviously 1983).

log in to reply • written 18 months ago by Andrewjgrimm  4118
 
 
6
 
 

See Genome Project Statistic: http://www.ncbi.nlm.nih.gov/genomes/static/gpstat.html

update ... and the (rather incomplete) category in wikipedia Sequenced genomes : http://en.wikipedia.org/wiki/Category:Sequenced_genomes

 
 
 
 
4
 
 

This one is helpful too

http://www.genome.gov/sequencingcosts/

alt text

 
 
 
 
1
 
 

I recommend you to use diArk for the latest genome files. The stats can be found using http://www.diark.org/diark/statistics

 
 
 

That's a bunch of neat plots, thanks for sharing this.

log in to reply • written 6 weeks ago by Khader Shameer  119711028
 
Log in to add a post