I like the information added within this response very much.
10
|
There was another really good graphic that Lincoln Stein used in his talk at Beyond The Genome last week. It is available from this paper: The case for cloud computing in genome informatics It is figure 2 in there. It shows the slope of sequence data pre-NGS, and the change recently. And also the point where we have now crossed storage vs production: we have now passed the point where we can afford to store it: "The cost of genome sequencing is now decreasing several times faster than the cost of storage, promising that at some time in the not too distant future it will cost less to sequence a base of DNA than to store it on a hard disk....The various members of the genome informatics ecosystem are now facing a potential tsunami of genome data that will swamp our storage systems and crush our compute clusters." Also at this meeting people were trying to change the meme from big scary data (deluge, tsunami, etc) to "data bonanza". People were attempting to use that--but they still seemed scared :) |
|
|
lol, I wasn't there, but you can count me with the scared ones. I do have ideas, and a plan, for dealing with a certain amount of data growth. But if this keeps going indefinitely, where will we end up? That's what I'm afraid of. Is Pac Bio going to save me from short reads? Or are they just going to multiply the data volume? Or both at the same time, plus a continuing flood of 2nd-gen data? Or more generally - what is the new equilibrium going to look like, and when are we going to get there? The fact that I don't know is what makes me nervous. | ||
8
|
You might want to have a look at the statistics from GOLD the 'Genomes OnLine Database' here as this has statistics at the genome, not basepair level. |
|
|
I just realized that they actually have the data in an Excel spreadsheet at the top of the page which is what I wanted. http://genomesonline.org/Gold_Stats.xls | ||
11
|
|
|
6
|
See Genome Project Statistic: http://www.ncbi.nlm.nih.gov/genomes/static/gpstat.html update ... and the (rather incomplete) category in wikipedia Sequenced genomes : http://en.wikipedia.org/wiki/Category:Sequenced_genomes |
|
|
| ||
1
|
I recommend you to use diArk for the latest genome files. The stats can be found using http://www.diark.org/diark/statistics |
|
I guewss the genomes online one is the best answer. Thank you for the boost on my question giovanni.
Maybe a better question would be, where are these data so that we can generate our own pretty graphs? But then again, I realize that the data are out there--you just have to find them and bring them together yourself!
Although, everyone gave really great answers and I learned a lot from going through your links and what you said. Thank you all!