Question

How To Merge Two Microarrays Datasets?

2

Entering edit mode

12.1 years ago

fbrundu ▴ 350

Hi all, I am trying to merge two microarrays datasets, as in this paper. I did not understand how to do it because the datasets do not share the same samples' names set.. I did not find any field that can relate one dataset to each other.. any hint on how to do it?

The two datasets are this and this.

Thanks

microarray merge dataset • 10k views

ADD COMMENT • link updated 2.7 years ago by Ram 45k • written 12.1 years ago by fbrundu ▴ 350

0

Entering edit mode

added answer*

ADD REPLY • link 11.2 years ago by andrew.j.skelton73 6.6k

0

Entering edit mode

haw can we integrate GSE with different GPL?

ADD REPLY • link updated 5.5 years ago by Ram 45k • written 10.3 years ago by Shamim Sarhadi ▴ 220

0

Entering edit mode

I am also interested in integrating GSE with different GPL (GPL96 vs. GPL3921), did you find any solution for your problem?

ADD REPLY • link updated 2.7 years ago by Ram 45k • written 9.9 years ago by Bioinformatist Newbie ▴ 270

Ram · Answer 1 · 2014-04-22

InSilico DB has a "merging" R-Bioconductor package to combine public datasets from GEO. If you are not using R you can also combine data from the online platform (See this short step-by-step tutorial)

Example:

# Retrieve 2 datasets
eset1 = getDataset(gse="GSE10072", gpl="GPL96", norm="ORIGINAL", genes=TRUE);
eset2 = getDataset(gse="GSE7670", gpl="GPL96", norm="ORIGINAL", genes=TRUE);

#combine them
esets = list(eset1, eset2);
eset = merge(esets, method="NONE");

#plot them
plotMDS(eset, targetAnnot="Disease", batchAnnot="Study");

InSilico DB packaged various batch removal effects methods so line 4 could be replaced with:

eset = merge(esets, method="XPN");

# or

eset = merge(esets, method="COMBAT");

Hope this helps.

For more info Bioinformatics paper reference; InSilico DB and InSIlico Merging packages links, and blog link.

-Tutorial example : https://insilicodb.org/the-impact-of-batch-effects-when-merging-different-data-sets/

R-Bioconductor packages:

score 1 · Answer 2 · 2013-06-10

1

Entering edit mode

12.1 years ago

Sean Davis 27k

You can certainly download all the .CEL files and normalize them together. However, you may find that your hypothesis testing could be challenging since there will likely be a batch effect between the two datasets.

ADD COMMENT • link 12.1 years ago by Sean Davis 27k

Ram · Answer 3 · 2014-04-22

In theory if you have two sets of raw expression sets, from the same array model, then you can simply bind one to the other (accounting for the probe location). However, doing this creates a whole world of problems. There would have to be a very good justification for doing this. The first problem is that there will be a batch effect between the two datasets, as previously mentioned. If you manage to correct for that successfully then, you might get some meaningful data out of the analysis, might. This is a post experimental design decision and is not recommended.