Question: Quantifying Data Sets From Public Repositories
2
gravatar for Agatha
7.8 years ago by
Agatha340
Agatha340 wrote:

Is there a method (relatively automatic) for quantifying the datasets in ArrayExpress or Gene Expression Omnibus, for example?

I would like to make some statistics on the number of datasets per platform,tissue type,species etc

Otherwise, which would be the best way to do it?

Example

Array Express:

http://www.ebi.ac.uk/arrayexpress/browse.html

I would like to filter experiments from this type of listing and quantify the data sets. 4 datasets chip-seq, mus musculus, 3 datasets rna seq homo sapiens, etc

geo data database microarray • 1.6k views
ADD COMMENTlink modified 7.8 years ago by Alastair Kerr5.2k • written 7.8 years ago by Agatha340
2
gravatar for Alastair Kerr
7.8 years ago by
Alastair Kerr5.2k
The University of Edinburgh, UK
Alastair Kerr5.2k wrote:

EDIT I read your question as gene centric rather than database centric. Hence this edit,

GEO has a good summary table you can use: The Summary page has 5 tabs that will give you the summary data based on platform/series/sample/organism but not tissue. I would get in touch with the ncbi help desk for this but the data might be accesible via the sql backend.

Old Answer: This sounds very similar to the data found in BioGPS. Note: check the apps section for increased functionality as well as the downloads section for the underlying data.

ADD COMMENTlink modified 7.8 years ago • written 7.8 years ago by Alastair Kerr5.2k

@Alastair Kerr -I've had a brief look, I will look more carefully. However it seems like you can search for individual genes..I would like to have some statistics on the entire datasets containing expression data stored in the public repositories..like 4 datasets expression data, microarray, small non coding rnas; 2 datasets expression data, microarray, small nc rnas , mus musculus etc

ADD REPLYlink written 7.8 years ago by Agatha340

*small nc rnas- homo sapiens in the first example

ADD REPLYlink written 7.8 years ago by Agatha340

If a measure of the database and not the gene, then yes. Let me edit the answer

ADD REPLYlink written 7.8 years ago by Alastair Kerr5.2k

@Alastair Kerr- it is just perfect, thanks.I couldn't find anything similar for ArrayExpress though..do you know anything about this?

ADD REPLYlink written 7.8 years ago by Agatha340

@Alastair Kerr- Also, I am assuming that exporting to a csv file and then processing is the only possible way to do this, right?

ADD REPLYlink written 7.8 years ago by Agatha340

As far as I know as I have never seen summary data for arrayexpress. I would contact their helpdesk (miamexpress@ebi.ac.uk) first as they may have programmatic access.

ADD REPLYlink written 7.8 years ago by Alastair Kerr5.2k

@Alastair Kerr- thanks

ADD REPLYlink written 7.8 years ago by Agatha340

There is programmatic access to ArrayExpress - http://www.ebi.ac.uk/fg/doc/help/programmatic_access.html. However, it isn't great. They told me that they've exposed their internal API for public use, but it is far from production-ready.

ADD REPLYlink written 7.8 years ago by Neilfws48k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1239 users visited in the last hour