Question

What Are Your Most-Used Public Data Repositories?

9

Entering edit mode

10.7 years ago

Sean Davis 26k

If you were to catalog public data repositories that house public "omics" and other high-throughput data, what would you include? What are some of the public data repositories to which you have contributed or that you use regularly? In particular, I'd be interested in hearing about repositories or databases of raw omics data that are off-the-beaten-path but that are critical to your research.

Clarification: I am mainly interested in databases that collect and host omics data. I see, for example, that flybase seems to host some modENCODE RNA-seq data.

database • 5.1k views

ADD COMMENT • link updated 10.7 years ago by lwc628 ▴ 230 • written 10.7 years ago by Sean Davis 26k

0

Entering edit mode

use: Sequence Ontology, Gene Ontology, NHLBI exome server, pox.org

ADD REPLY • link 10.7 years ago by Zev.Kronenberg 12k

score 5 · Answer 1 · 2013-08-20

5

Entering edit mode

10.7 years ago

Dan D 7.4k

Definitely the 1000 genomes project:

http://www.1000genomes.org/data#DataAccess

ADD COMMENT • link 10.7 years ago by Dan D 7.4k

score 5 · Answer 2 · 2013-08-21

5

Entering edit mode

10.7 years ago

brentp 24k

By far we use the UCSC genome browser and resources the most. I use the mysql database quite a bit and use the browser to display our data overlaid on all the existing tracks.

http://genome.ucsc.edu/

ADD COMMENT • link 10.7 years ago by brentp 24k

score 4 · Answer 3 · 2013-08-21

4

Entering edit mode

10.7 years ago

Charles Warden 8.2k

At the risk of stating the obvious, I most often download data from SRA and ArrayExpress (which also has some NGS data).

GEO is also useful for searching for relevant projects because GEO provides links to the corresponding SRA data.

TCGA is also a commonly used resource, but you typically have to get special permission to access raw data.

ADD COMMENT • link 10.7 years ago by Charles Warden 8.2k

1

Entering edit mode

+1 for TCGA. FWIW, TCGA's "special permission" generally just consists of letting them know what you're going to do with the data and filling out a form. They want the data to be easy to get and a community resource, but have to balance that against concerns about the release of clinical data.

ADD REPLY • link 10.7 years ago by Chris Miller 22k

score 4 · Answer 4 · 2013-08-22

UCSC mainly for me too. But I also use the InterMines for the ModENCODE data ( http://modencode.org/ ), and BioMart interface to get to stuff I need that's not at UCSC. That connects me to a lot of sources.

My needs are pretty random--sometimes I'll need a big list of fly gene symbols. And then I'll need some cancer data. Another one I turn to is the International Cancer Genome Consortium: http://icgc.org/

For microbial data I often go to IMG to see what's available. http://img.jgi.doe.gov/

score 4 · Answer 5 · 2013-08-22

4

Entering edit mode

10.7 years ago

lwc628 ▴ 230

Ensemble(http://useast.ensembl.org/info/data/ftp/index.html). No?

I download all my references and annotations from here

ADD COMMENT • link 10.7 years ago by lwc628 ▴ 230

score 3 · Answer 6 · 2013-08-20

3

Entering edit mode

10.7 years ago

Stephen 2.8k

I use GEO frequently. dbGaP when I have to - access is painful.

ADD COMMENT • link 10.7 years ago by Stephen 2.8k

score 1 · Answer 7 · 2013-08-22

1

Entering edit mode

10.7 years ago

zx8754 11k

We use 1000 genomes project, UCSC genome browser tables, TCGA, and we contribute to ICGC Prostate Cancer - http://icgc.org/icgc/cgp/70/508/71331

ADD COMMENT • link 10.7 years ago by zx8754 11k