Just to add some crusial comments to further extend the comprehensive answer of Kevin:
1) Firstly, regarding the two different platforms of microarrays used: are the same for each platform ? for instance in affymetrix you have mentioned from the link above that is affymetrix hgu133a-but there are also some other different "sub-platforms" ? are also the experimental conditions similar ? or you have evident variations in the experimental design concering the developmental stages in Arabidopsis, that could lead to a clear batch effect ? In other words, generally combining any of the datasets (even the similar affymetrix platform), you would have to construct a rather-complicated experiment to account for experiment/study-specific effects (as also some other potential problems with normalization, variance estimation etc.)
2) In my opinion, a first "basic and powerful" approach-if again you have similar experimental designs and biological questions-, would be to perform each DE analysis for each dataset separately. Then:
A) I would initially compare the DE probes-or more appropriately annotate to gene symbols-, to find any "common genuine DE genes" that are characterized constantly, between different datasets, or experiments. As also, possible differences.
2) In parallel, you could next perform a kind of "functional-enrichment meta-analysis"-again for each of your separate DE lists, conduct some "GO/KEGG" analysis, and inspect for common biological pathways or biological processes appeared in different datasets.
Finally, if you like to try the approach of merging, you could follow the instructions above, and perhaps perform probably a batch effect correction with ComBat with R package sva, using as a known covariate the different experimental study.
(*Regarding RankProd, it is another possibility, but again i would suggest the R package RankAggreg, which seems more appropriate regarding your approach: you would have to analyze each dataset separately, keep the topk ranked genes by a criterion, and then perform a similar analysis to keep the most "informative genes".)
Hope that helps,
modified 17 months ago
17 months ago by
svlachavas • 560