Question: How To Merge Two Microarrays Datasets?
2
gravatar for fbrundu
5.8 years ago by
fbrundu280
European Union
fbrundu280 wrote:

Hi all, I am trying to merge two microarrays datasets, as in this paper. I did not understand how to do it because the datasets do not share the same samples' names set.. I did not find any field that can relate one dataset to each other.. any hint on how to do it?

The two datasets are this and this.

Thanks

dataset merge microarray • 5.1k views
ADD COMMENTlink modified 4.0 years ago by Shamim Sarhadi210 • written 5.8 years ago by fbrundu280

added answer*

ADD REPLYlink modified 4.9 years ago • written 4.9 years ago by andrew.j.skelton735.5k

haw can we integrate GSE with different GPL?

 

ADD REPLYlink written 4.0 years ago by Shamim Sarhadi210

I am also interested in integrating GSE with different GPL (GPL96 vs. GPL3921) , did you find any solution for your problem?

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by Bioinformatist Newbie230
3
gravatar for alaincoletta
4.9 years ago by
alaincoletta110
Belgium
alaincoletta110 wrote:

InSilico DB has a "merging" R-Bioconductor package to combine public datasets from GEO. If you are not using R you can also combine data from the online platform (See this short step-by-step tutorial)

Example:
# Retrieve 2 datasets
eset1 = getDataset(gse="GSE10072", gpl="GPL96", norm="ORIGINAL", genes=TRUE);
eset2 = getDataset(gse="GSE7670", gpl="GPL96", norm="ORIGINAL", genes=TRUE);

#combine them
esets = list(eset1, eset2);
eset = merge(esets, method="NONE");

#plot them
plotMDS(eset, targetAnnot="Disease", batchAnnot="Study");

InSilico DB packaged various batch removal effects methods so line 4 could be replaced with:

eset = merge(esets, method="XPN");
or
eset = merge(esets, method="COMBAT");

Hope this helps.

For more info Bioinformatics paper reference; InSilico DB and InSIlico Merging packages links, and blog link.

- Unlocking the potential of publicly available microarray data using inSilicoDb and inSilicoMerging R/Bioconductor packages -BMC Bioinfomatics [http://www.biomedcentral.com/1471-2105/13/335/abstract]

- inSilicoDb: an R/Bioconductor package for accessing human Affymetrix expert-curated datasets from GEO - Bioinformatics [http://bioinformatics.oxfordjournals.org/content/27/22/3204]

-Tutorial example : https://insilicodb.org/the-impact-of-batch-effects-when-merging-different-data-sets/

R-Bioconductor packages:
http://www.bioconductor.org/packages/2.12/bioc/html/inSilicoDb.html
and
http://www.bioconductor.org/packages/2.12/bioc/html/inSilicoMerging.html

ADD COMMENTlink written 4.9 years ago by alaincoletta110
1
gravatar for Sean Davis
5.8 years ago by
Sean Davis25k
National Institutes of Health, Bethesda, MD
Sean Davis25k wrote:

You can certainly download all the .CEL files and normalize them together. However, you may find that your hypothesis testing could be challenging since there will likely be a batch effect between the two datasets.

ADD COMMENTlink written 5.8 years ago by Sean Davis25k
1
gravatar for andrew.j.skelton73
4.9 years ago by
London
andrew.j.skelton735.5k wrote:

In theory if you have two sets of raw expression sets, from the same array model, then you can simply bind one to the other (accounting for the probe location). However, doing this creates a whole world of problems. There would have to be a very good justification for doing this. The first problem is that there will be a batch effect between the two datasets, as previously mentioned. If you manage to correct for that successfully then, you might get some meaningful data out of the analysis, might. This is a post experimental design decision and is not recommended.

ADD COMMENTlink written 4.9 years ago by andrew.j.skelton735.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1154 users visited in the last hour