Question: Finding Gene Coexpression In Geo?
gravatar for User 7065
9.4 years ago by
User 706520
User 706520 wrote:

I am tasked with finding the coexpression of genes in GEO. I have a few problems with this and some suspicions that we are going about this in the wrong way.

Firstly, we're using AFFY probes so each gene can have multiple probes. Is there a different probe set we should be using to find the expression data for an entire gene?

Our plan is to use "cor" in R to find the correlation coefficient of the probes across the samples in a GEO Dataset. I have been told that this is not the correct way to find coexpression data.

Thank you for your help.

geo • 3.3k views
ADD COMMENTlink written 9.4 years ago by User 706520

What is the biological question you want to answer? I assume that you want to find the coexpression of genes for some reason and in some biologic context. What is the reason and the biologic context?

ADD REPLYlink written 9.4 years ago by Sean Davis26k
gravatar for W Langdon
9.4 years ago by
W Langdon30
W Langdon30 wrote:

The RNAnet tool does this. It has Human GEO Affymetrix HG-U133 2+ preloaded. It allows you to display correlation between expression across thousands of samples in a few seconds. You can use Affy ids or (where it can) it maps Ensembl ids to Affy probes.

Eg load ENSE00001045180 into the first text box press ENSE lookup (gives list of corresponding AFFY ids) press Heatmap gives heatmap of 171 correlations (approx 2 secs) click on yellow heatmap cell gives scatter plat (press plot!!!) Drag blue crosshairs to get link into GEO for each data point

Some of the graphical displays use canvas graphics so you may need a Firefox webrowser.


alt text

ADD COMMENTlink modified 9.4 years ago • written 9.4 years ago by W Langdon30
gravatar for
9.4 years ago by
European Union wrote:

Probe sets change in every version of every platform. As the time goes on they enhance their genome coverage. In your analysis keep every probe that passes your quality control thresholds.

You should face 2 issues: i) how to get comparable datasets, ii) how to analyze them.

i) GEO is a mine, but pooling different expression datasets together is a risky task in term of bias. I was impressed by a presentation of S.Bicciato's work on GEO data and I suggest you to look at "Novel definition files for human GeneChips based on GeneAnnot"[PMID:18005434] and "Strategies for comparing gene expression profiles from different microarray platforms: application to a case-control experiment"[PMID:16624241] as starting point if you want to go this way. Otherwise you could turn to a meta-analysis approach, avoiding the bias of merging data, but I imagine this could open a new thread in the blog...

ii) If your analysis is referred to a status variable (e.g. treated vs untreated) it is better to test for differentially expressed probes/genes (look at the good documentation coming with the limma package) and then you could represent your results with a heatmap (this step implies correlations).

If you just want to create a gene profile across your samples cor can do the job of measuring the "distance" between each pair of probes/genes, but then you should cluster your results in some way. Otherwise the whole job could be done by some pre-packaged implementations of k-means or hierarchical clustering or many others.

ADD COMMENTlink written 9.4 years ago by
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1358 users visited in the last hour