Question

How to integrate multiple datasets from different microarray

0

Entering edit mode

3.6 years ago

ijvechetti ▴ 10

Hello,

I want to compare 2 published microarrays (same GPL, same chip type, same species, and same experimental design) with differences in age (young vs old). These arrays were made by the same group as well, just published in different years. I was wondering if it would be ok to just download all the CEL files and run RMA in all samples together, and then model my statistical analysis using Limma with different batches?

Now, what if I want to add another array with differences in GPL, chip type.....is it possible?

Wha do you guys recommend?

Thanks in advance

Ivan

microarray R • 812 views

ADD COMMENT • link updated 3.6 years ago by Kevin Blighe 87k • written 3.6 years ago by ijvechetti ▴ 10

score 2 · Answer 1 · 2020-09-04

2

Entering edit mode

3.6 years ago

Kevin Blighe 87k

If everything is the same other than the fact that they are simply 2 different experiments, then, yes, I would process all CEL files together. You will still likely see some effect of batch via, e.g., a PCA bi-plot (please check for this), in which case you can do the following:

For any differential expression comparisons, simply include batch in your design formula, e.g., ~ batch + treatment, in which case, any test statistics that you derive for treatment will automatically be adjusted for the effect of batch
for PCA, clustering, heatmaps, etc., eliminate the batch effect from the log2 expression data via limma::removeBatchEffect()

Kevin

ADD COMMENT • link 3.6 years ago by Kevin Blighe 87k

0

Entering edit mode

Thank you so much. Just for clarification, for your second point, you mean to plot using the summarized values after RMA before running any statistical model? Also, if you don't mind, it would be possible to compare different arrays with different platforms?

ADD REPLY • link 3.6 years ago by ijvechetti ▴ 10

1

Entering edit mode

For PCA, clustering, etc., yes, these are generated from the RMA-normalised data, on which you may also have subtracted out the effect of batch (and, to clarify, the effect of batch is subtracted from the RMA-normalised data itself).

For your other question, there are no standards - you could take a look at my previous answer here: A: How to integrate multiple data sets from microarray platform prior meta-analysis

ADD REPLY • link 3.6 years ago by Kevin Blighe 87k