Question: Microarray meta analysis with SOFT datasets
1
gravatar for chenyangls
4.8 years ago by
chenyangls10
United States
chenyangls10 wrote:

I found when doing microarray meta analysis, people usually download the CEL data from databases like GEO, normalize the raw data with R packages such as affy, and then use R packages like MetaDE to complete the meta analysis with the normalized datasets. But I found many datasets in GEO are in the SOFT format, and these data are already normalized. Can I directly import SOFT datasets into MetaDE and do meta analysis?

R • 3.1k views
ADD COMMENTlink modified 4.8 years ago by Martombo2.4k • written 4.8 years ago by chenyangls10
0
gravatar for Martombo
4.8 years ago by
Martombo2.4k
Seville, ES
Martombo2.4k wrote:

you can use the package GEOquery. the function getGEO(filename="data.soft.gz") can read your data into an expression set object. have a look at http://www2.warwick.ac.uk/fac/sci/moac/people/students/peter_cock/r/geo/

ADD COMMENTlink modified 4.8 years ago • written 4.8 years ago by Martombo2.4k

Thank you for the reply! So is it validate to directly use the processed soft data or do I need to go back to the cel data and normalize different datasets in one particular way?

ADD REPLYlink written 4.8 years ago by chenyangls10

There is no "right" answer to this question since there is no standard for normalization and not all datasets even supply raw data.  

Just a comment, getGEO() will take an accession.  There is no need to download soft files separately.  

ADD REPLYlink written 4.8 years ago by Sean Davis25k

Thanks! I'm just curious, since there is no standard way for normalization, why do people choose to normalize their own way rather than trusting the original data curators. It adds workload but can be even less reliable. 

ADD REPLYlink written 4.8 years ago by chenyangls10

Starting with raw data allows more thorough quality control and also ensures that the normalization was done appropriately (rather than taking the word of the original submitters).  Not everyone does renormalize and not all data on GEO actually have enough raw data to perform an appropriate normalization.

ADD REPLYlink written 4.8 years ago by Sean Davis25k

Thanks for the comments. I just tried importing GDS402 with GEOquery, but got this error: "cannot open connection," but I can download this file from the website. I think I've ruled out internet connection problems, since I can import other datasets, such as GDS507, with GEOquery.

 

I used the command:

gds <- getGEO(filename=system.file("extdata/GDS402.soft.gz",package="GEOquery"))

because when using getGEO(), I got this error message:

"cannot open destfile 'C:\Users\...\AppData\Local\Temp\RtmpiO3KxZ/GDS402.soft.gz'"

I guess it has something to do with the windows OS.

So did I do anything wrong when using GEOquery or did the FTP of GEO change their url? 

 

ADD REPLYlink written 4.8 years ago by chenyangls10

All you need is:

gds = getGEO('GDS402')

ADD REPLYlink written 4.8 years ago by Sean Davis25k

Just tried gds = getGEO('GDS402'), still have the error:

cannot open destfile 'C:\Users\...\AppData\Local\Temp\RtmpiO3KxZ/GDS402.soft.gz', reason 'No such file or directory'

ADD REPLYlink written 4.8 years ago by chenyangls10

Please start a new R session, load the GEOquery library, run the line of code above, and paste in any error message along with the output of sessionInfo().  

ADD REPLYlink written 4.8 years ago by Sean Davis25k

It worked after starting a new R session. Thanks a lot!

ADD REPLYlink written 4.8 years ago by chenyangls10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1388 users visited in the last hour