Question: Aggregating Microarrays from different experiments
gravatar for ad
4.3 years ago by
United States
ad30 wrote:

Let's say I had two separate gene expression microarray experiments of normal vs cancer cell for the same cell type on the same platform like the  Affymetrix Human Genome U133A Array. Would there be any pitfalls of aggregating the data by taking the CEL files from both and RMA normalizing it then comparing the aggregate control to the aggregate cancer? If so would it be best to start from the CEL files or could the aggregation work on even on more processed downstream data like the expression values from the soft files in the GEO database?

ADD COMMENTlink modified 4.3 years ago by matt.newman130 • written 4.3 years ago by ad30
gravatar for Neilfws
4.3 years ago by
Sydney, Australia
Neilfws48k wrote:

Plenty of pitfalls, yes. But we can combine different studies, it's called meta-analysis. As a starting point, you may like to investigate the R/Bioconductor package RankProd which was written for this purpose.

ADD COMMENTlink written 4.3 years ago by Neilfws48k

I agree with Neilfws and disagree with matt.newman and andrew. Though I would like to add that you should include batch/study as a covariate in the design matrix. I haven't used RankProd before but it looks appealing. If you want to find genes relevant for your cancer, doing meta-analysis compared with comparing the end results of both studies (p-values of fold changes) you have a worse sensitivity and worse specificity if you pursue the latter. For example, genes that are found differential in 1 study and not in the other, are often genes that are borderline significant in both studies. A venn diagram/comparing p-values of the 2 studies doesn't consider this information while integrated meta-analysis does.

ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by Irsan7.0k

I guess it really depends on how many datasets you're comparing.  Take this one for example (Taken from ImmunoLand by Omicsoft:

The x-axis represents a log2 fold change, while the size of the dot indicates p-value.  Each dot is a comparison in a particular GEO dataset. I think you can make a conclusion that this gene (and genes with similar patterns) are consistently up-regulated in skin disease and IBD, when compared to normal.



ADD REPLYlink written 4.3 years ago by matt.newman130
Sure things that are very solid will be more easily identified using both methods
ADD REPLYlink written 4.3 years ago by Irsan7.0k
gravatar for andrew
4.3 years ago by
United States
andrew470 wrote:

As Neilfws points out, it can be done, but that doesn't mean it should be done.  We have an application designed specifically for this called iPathwayGuide  This is a web-based application that doesn't require any coding experience.  Simply upload your CEL files in the groups your wish to analyze.  iPathwayGuide will QC check and normalize (GCRMA) the CEL files automatically.  Information about DEGs, Predicted miRNAs, GO terms, Pathways and diseases is provided in minutes.  If you have multiple analyses, you can generate a meta report that will give you information about the overlap between the datasets.  Here's are a brief video.

ADD COMMENTlink written 4.3 years ago by andrew470
gravatar for matt.newman
4.3 years ago by
United States
matt.newman130 wrote:

I'd consider looking at the results of the differentially expressed genes and compare those between the datasets, rather than trying to aggregate the raw (or normalized data) all at once.  We do something similar to public datasets in our ImmunoLand product (

ADD COMMENTlink written 4.3 years ago by matt.newman130
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2017 users visited in the last hour