Question

Microarray Meta-Analysis

5

Entering edit mode

12.6 years ago

Nasir ▴ 50

I would be very grateful for suggestions on how best to tackle this project. I want to find out which SLC transporter transcripts are most highly expressed in the normal human hippocampus. For this, I plan to use publicly available Affymetrix U133 Plus 2.0 microarray data from ArrayExpress/GEO. I will utilize the CEL files or normalized data for just the normal/control tissues. How can I combine data from different experiments/studies (all done on the U133 Plus 2.0 platform) to get the most reliable estimate and hierarchical list of transcipt abundance in normal human hippocampus? Thank you!

microarray meta affymetrix • 4.4k views

ADD COMMENT • link updated 12.6 years ago by Qdjm 1.9k • written 12.6 years ago by Nasir ▴ 50

score 4 · Answer 1 · 2011-09-07

The most important aspect for your analysis is possibly to run the normalization and probe summarization again from the CEL files. Reasons for this are twofold: First, to use a consistent array-design description for all arrays during pre-processing, second, many normalization methods (eg. quantile-normalization) tend to scale the arrays in the context of all chips in the experiment. If you take the arrays out of context, and put them into a new one, the absolute values become meaningless. Thus I would recommend to collect all CEL files into a 'virtual experiment' and run normalization, summarization on them using the latest array description file (.adf).

score 3 · Answer 2 · 2011-09-07

Please check arrayexpress atlas: http://www.ebi.ac.uk/gxa/

It is a curated subset of arrayexpress where the curators think the studies are useful for the kind of comparisons you want to do.

If I remember correctly it also already provides re-normalized data using RMA, to make data as comparable as can be. But you will almost certainly need a statistical modelling approach that includes studies as a factor.

Ram · Answer 3 · 2011-09-07

1

Entering edit mode

12.6 years ago

Qdjm 1.9k

If all you need is a rank ordering of SLC transporter transcripts, you could try sorting the expression levels in each array separately and then replacing each expression level with its sort order in the array. Then your "expression level" for each gene would be its median (or mean) rank (i.e. sort order) across all of the hippocampal arrays.

The advantage of this approach is that you need not worry about making all the measurements comparable. If you have enough samples, I bet you'll get virtually the same answer as a re-normalization approach.

In addition to Michael and Chris' suggestions, you might also need to run ComBat.R to combine data from different labs together. See the answers to this question on combining gene expression from multiple arrays.

ADD COMMENT • link updated 4.6 years ago by Ram 43k • written 12.6 years ago by Qdjm 1.9k

0

Entering edit mode

Thank you all for your insightful comments. I do just need a rank ordering of transcripts. Hence, the simplicity of Quaid's approach is very appealing. Do you think anything additional will be gained by putting the individual array data into RankProd?

ADD REPLY • link 12.6 years ago by Nasir ▴ 50

0

Entering edit mode

Can't say, never used RankProd. But if it's a way to determine confidence intervals using ranks, it sounds like it would be a good thing to do.

ADD REPLY • link 12.6 years ago by Qdjm 1.9k