How To Merge Two Microarrays Datasets?
3
2
Entering edit mode
8.5 years ago
fbrundu ▴ 300

Hi all, I am trying to merge two microarrays datasets, as in this paper. I did not understand how to do it because the datasets do not share the same samples' names set.. I did not find any field that can relate one dataset to each other.. any hint on how to do it?

The two datasets are this and this.

Thanks

microarray merge dataset • 7.1k views
ADD COMMENT
0
Entering edit mode

added answer*

ADD REPLY
0
Entering edit mode

haw can we integrate GSE with different GPL?

ADD REPLY
0
Entering edit mode

I am also interested in integrating GSE with different GPL (GPL96 vs. GPL3921) , did you find any solution for your problem?

ADD REPLY
3
Entering edit mode
7.6 years ago
alaincoletta ▴ 160

InSilico DB has a "merging" R-Bioconductor package to combine public datasets from GEO. If you are not using R you can also combine data from the online platform (See this short step-by-step tutorial)

Example:
# Retrieve 2 datasets
eset1 = getDataset(gse="GSE10072", gpl="GPL96", norm="ORIGINAL", genes=TRUE);
eset2 = getDataset(gse="GSE7670", gpl="GPL96", norm="ORIGINAL", genes=TRUE);

#combine them
esets = list(eset1, eset2);
eset = merge(esets, method="NONE");

#plot them
plotMDS(eset, targetAnnot="Disease", batchAnnot="Study");

InSilico DB packaged various batch removal effects methods so line 4 could be replaced with:

eset = merge(esets, method="XPN");
or
eset = merge(esets, method="COMBAT");

Hope this helps.

For more info Bioinformatics paper reference; InSilico DB and InSIlico Merging packages links, and blog link.

- Unlocking the potential of publicly available microarray data using inSilicoDb and inSilicoMerging R/Bioconductor packages -BMC Bioinfomatics [http://www.biomedcentral.com/1471-2105/13/335/abstract]

- inSilicoDb: an R/Bioconductor package for accessing human Affymetrix expert-curated datasets from GEO - Bioinformatics [http://bioinformatics.oxfordjournals.org/content/27/22/3204]

-Tutorial example : https://insilicodb.org/the-impact-of-batch-effects-when-merging-different-data-sets/

R-Bioconductor packages:
http://www.bioconductor.org/packages/2.12/bioc/html/inSilicoDb.html
and
http://www.bioconductor.org/packages/2.12/bioc/html/inSilicoMerging.html

ADD COMMENT
1
Entering edit mode
8.5 years ago

You can certainly download all the .CEL files and normalize them together. However, you may find that your hypothesis testing could be challenging since there will likely be a batch effect between the two datasets.

ADD COMMENT
1
Entering edit mode
7.6 years ago

In theory if you have two sets of raw expression sets, from the same array model, then you can simply bind one to the other (accounting for the probe location). However, doing this creates a whole world of problems. There would have to be a very good justification for doing this. The first problem is that there will be a batch effect between the two datasets, as previously mentioned. If you manage to correct for that successfully then, you might get some meaningful data out of the analysis, might. This is a post experimental design decision and is not recommended.

ADD COMMENT

Login before adding your answer.

Traffic: 2290 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6