How To Merge Two Microarrays Datasets?
3
2
Entering edit mode
9.8 years ago
fbrundu ▴ 330

Hi all, I am trying to merge two microarrays datasets, as in this paper. I did not understand how to do it because the datasets do not share the same samples' names set.. I did not find any field that can relate one dataset to each other.. any hint on how to do it?

The two datasets are this and this.

Thanks

microarray merge dataset • 8.2k views
0
Entering edit mode

0
Entering edit mode

haw can we integrate GSE with different GPL?

0
Entering edit mode

I am also interested in integrating GSE with different GPL (GPL96 vs. GPL3921), did you find any solution for your problem?

3
Entering edit mode
8.9 years ago
alaincoletta ▴ 160

InSilico DB has a "merging" R-Bioconductor package to combine public datasets from GEO. If you are not using R you can also combine data from the online platform (See this short step-by-step tutorial)

Example:

# Retrieve 2 datasets
eset1 = getDataset(gse="GSE10072", gpl="GPL96", norm="ORIGINAL", genes=TRUE);
eset2 = getDataset(gse="GSE7670", gpl="GPL96", norm="ORIGINAL", genes=TRUE);

#combine them
esets = list(eset1, eset2);
eset = merge(esets, method="NONE");

#plot them
plotMDS(eset, targetAnnot="Disease", batchAnnot="Study");


InSilico DB packaged various batch removal effects methods so line 4 could be replaced with:

eset = merge(esets, method="XPN");

# or

eset = merge(esets, method="COMBAT");


Hope this helps.

R-Bioconductor packages:

1
Entering edit mode
9.8 years ago

You can certainly download all the .CEL files and normalize them together. However, you may find that your hypothesis testing could be challenging since there will likely be a batch effect between the two datasets.

1
Entering edit mode
8.9 years ago

In theory if you have two sets of raw expression sets, from the same array model, then you can simply bind one to the other (accounting for the probe location). However, doing this creates a whole world of problems. There would have to be a very good justification for doing this. The first problem is that there will be a batch effect between the two datasets, as previously mentioned. If you manage to correct for that successfully then, you might get some meaningful data out of the analysis, might. This is a post experimental design decision and is not recommended.