Question: How To Compare The Expression Of Two Genes In Large Microarray Dataset Such As Arrayexpress And Geo
1
gravatar for Dror
8.0 years ago by
Dror280
Israel
Dror280 wrote:

Is there an easy, programmable way, to extract large datasets from only repository of microarray data, like GEO and ArrayExpress and asses the co-expression of two genes in a large scale expression experiments? In more details: I suspect that two genes should have a similar expression pattern in mammals. So, I want to scan all the micro-array in which these two genes appear, and compare the expression pattern over a variety of experiments.

I would prefer doing in with python/biopython, but perl will be ok too.

geo python microarray • 3.7k views
ADD COMMENTlink modified 8.0 years ago by Neilfws48k • written 8.0 years ago by Dror280
5
gravatar for Neilfws
8.0 years ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

I've spent some time on programmatic mining of GEO and ArrayExpress. I wish the answer were "yes there is", but it is not.

First, both databases have APIs. The API for the ArrayExpress gene atlas is described here. It is rather limited in terms of queries and somewhat buggy - in fact, it's an internal API exposed to the outside world and is not really ready for general use.

GEO is searchable using EUtils. Programmatic access is described here. I have compiled lists of the terms that you can use to search the Entrez databases at this link: take a look at the gds, geoprofiles and geo text files. All the major programming languages have EUtils libraries: here are links for Bioperl, Biopython and BioRuby. I know the latter best; a simple query might look like this:

#!/usr/bin/ruby
require "rubygems"
require "bio"

# query GEO for GSE
Bio::NCBI.default_email = "me@me.com"
ncbi   = Bio::NCBI::REST.new
search = ncbi.esearch(Homo+sapiens[ORGN] AND GSE[ETYP] AND cel[suppFile]", {"db" => "gds", "retmax" => 200})

That will find GEO series for human studies with supplementary CEL file data.

You will encounter numerous issues with GEO: particularly (1) poorly-annotated samples and errors due to e.g. typos, because standards are not enforced and (2) expression values which may or may not be normalised (and if they are, in a variety of ways). So brace yourself for lots of manual curation.

In fact if you're interested in only a few genes, you may decide that programmatic access is more trouble than it is worth and just explore via the web interfaces. At the NCBI, searching GEO Profiles can be useful for a gene-centric view. The ArrayExpress Gene Atlas interface starts here.

ADD COMMENTlink written 8.0 years ago by Neilfws48k
4
gravatar for Pierre Lindenbaum
8.0 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum118k wrote:

I think people of Madcow have already computed this kind of data:

Madcow is a web tool questioning a coexpression data base with experiment filtering and several levels of significance. Results can be filtered, compared and annotated by identification of statistically over-represented Gene Ontology terms. Moreover, the user may visualize a coexpression network from the results by using the Cytoscape tool.

ADD COMMENTlink written 8.0 years ago by Pierre Lindenbaum118k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1121 users visited in the last hour