I found when doing microarray meta analysis, people usually download the CEL data from databases like GEO, normalize the raw data with R packages such as affy, and then use R packages like MetaDE to complete the meta analysis with the normalized datasets. But I found many datasets in GEO are in the SOFT format, and these data are already normalized. Can I directly import SOFT datasets into MetaDE and do meta analysis?
Thank you for the reply! So is it validate to directly use the processed soft data or do I need to go back to the cel data and normalize different datasets in one particular way?
There is no "right" answer to this question since there is no standard for normalization and not all datasets even supply raw data.
Just a comment,
getGEO()
will take an accession. There is no need to download soft files separately.Thanks! I'm just curious, since there is no standard way for normalization, why do people choose to normalize their own way rather than trusting the original data curators. It adds workload but can be even less reliable.
Starting with raw data allows more thorough quality control and also ensures that the normalization was done appropriately (rather than taking the word of the original submitters). Not everyone does renormalize and not all data on GEO actually have enough raw data to perform an appropriate normalization.
Thanks for the comments. I just tried importing GDS402 with GEOquery, but got this error: "cannot open connection," but I can download this file from the website. I think I've ruled out internet connection problems, since I can import other datasets, such as GDS507, with GEOquery.
I used the command:
because when using getGEO(), I got this error message:
I guess it has something to do with the windows OS.
So did I do anything wrong when using GEOquery or did the FTP of GEO change their url?
All you need is:
Just tried
gds = getGEO('GDS402')
, still have the error:Please start a new R session, load the GEOquery library, run the line of code above, and paste in any error message along with the output of
sessionInfo()
.It worked after starting a new R session. Thanks a lot!