Question

Finding Gene Coexpression In Geo?

2

Entering edit mode

13.9 years ago

User 7065 ▴ 20

I am tasked with finding the coexpression of genes in GEO. I have a few problems with this and some suspicions that we are going about this in the wrong way.

Firstly, we're using AFFY probes so each gene can have multiple probes. Is there a different probe set we should be using to find the expression data for an entire gene?

Our plan is to use "cor" in R to find the correlation coefficient of the probes across the samples in a GEO Dataset. I have been told that this is not the correct way to find coexpression data.

Thank you for your help.

geo • 4.5k views

ADD COMMENT • link updated 13.9 years ago by W Langdon ▴ 30 • written 13.9 years ago by User 7065 ▴ 20

0

Entering edit mode

What is the biological question you want to answer? I assume that you want to find the coexpression of genes for some reason and in some biologic context. What is the reason and the biologic context?

ADD REPLY • link 13.9 years ago by Sean Davis 27k

score 2 · Answer 1 · 2011-08-23

The RNAnet tool does this. It has Human GEO Affymetrix HG-U133 2+ preloaded. It allows you to display correlation between expression across thousands of samples in a few seconds. You can use Affy ids or (where it can) it maps Ensembl ids to Affy probes.

Eg http://bioinformatics.essex.ac.uk/users/wlangdon/rnanet/correlation.html load ENSE00001045180 into the first text box press ENSE lookup (gives list of corresponding AFFY ids) press Heatmap gives heatmap of 171 correlations (approx 2 secs) click on yellow heatmap cell gives scatter plat (press plot!!!) Drag blue crosshairs to get link into GEO for each data point

Some of the graphical displays use canvas graphics so you may need a Firefox webrowser.

Bill

alt text

score 0 · Answer 2 · 2011-08-23

Probe sets change in every version of every platform. As the time goes on they enhance their genome coverage. In your analysis keep every probe that passes your quality control thresholds.

You should face 2 issues: i) how to get comparable datasets, ii) how to analyze them.

i) GEO is a mine, but pooling different expression datasets together is a risky task in term of bias. I was impressed by a presentation of S.Bicciato's work on GEO data and I suggest you to look at "Novel definition files for human GeneChips based on GeneAnnot"[PMID:18005434] and "Strategies for comparing gene expression profiles from different microarray platforms: application to a case-control experiment"[PMID:16624241] as starting point if you want to go this way. Otherwise you could turn to a meta-analysis approach, avoiding the bias of merging data, but I imagine this could open a new thread in the blog...

ii) If your analysis is referred to a status variable (e.g. treated vs untreated) it is better to test for differentially expressed probes/genes (look at the good documentation coming with the limma package) and then you could represent your results with a heatmap (this step implies correlations).

If you just want to create a gene profile across your samples cor can do the job of measuring the "distance" between each pair of probes/genes, but then you should cluster your results in some way. Otherwise the whole job could be done by some pre-packaged implementations of k-means or hierarchical clustering or many others.