Question: How to extract information about normalisation method (e.g. RMA, MAS5) from GEO Soft File?
gravatar for StephanieK
3.8 years ago by
StephanieK100 wrote:

Hi all, I have a list of data sets like this that I downloaded from GEO:


I just want to know whether the expression values in these datasets have been transformed using RMA, some other algorithm or not transformed. In reality I have a lot of data sets, so I was wondering what's the best way to take an input list like the example above, and turn it into:

GDS1279: RMA
GDS1280: MAS5
GDS1311: RMA
GDS156: Not transformed
GDS1647: RMA etc

Someone told me to look in the soft files, I can see for example, for GDS4858 there's "dataset_value_type=transformed count", but not specifically what software/algorithm (e.g. RMA, MAS5) was picked? If someone could show me specifically where in the file this information is? Or alternatively, if you could show me for an example a webpage where I could get the information, I'll do it manually? I know there's also some code you can use for GEO, but I'm not familiar with it, if someone could provide sample code to obtain this info from GEO I'd also appreciate it.

Thanks for your time

soft mas5 rma R geo • 1.5k views
ADD COMMENTlink modified 3.8 years ago by Santosh Anand5.1k • written 3.8 years ago by StephanieK100
gravatar for Santosh Anand
3.8 years ago by
Santosh Anand5.1k
Santosh Anand5.1k wrote:

GDS is a sort of 'virtual agglomeration' of samples whose data has been processed in a similar way so that they can be compared easily. Just grab any sample from the GDS and look at its properties to get the desired info.

gds = getGEO("GDS1279")
samples = Meta(gds)$sample_id
samples = strsplit(samples, ",")
sample1 = samples[[1]][1] # get the first sample
[1] "GSM74432"

# Query the first sample now 
gsm = getGEO(sample1)
Meta(gsm)$data_processing # data processing info
[1] "MAS5.0"

Look GEOquery vignette for further help (esp. section 1.4 Datasets)

1.4 Datasets

GEO DataSets (GDSxxx) are curated sets of GEO Sample data. A GDS record represents a collection of biologically and statistically comparable GEO Samples and forms the basis of GEO’s suite of data display and analysis tools. Samples within a GDS refer to the same Platform, that is, they share a common set of probe elements. Value measurements for each Sample within a GDS are assumed to be calculated in an equivalent manner, that is, considerations such as background processing and normalization are consistent across the dataset. Information reflecting experimental design is provided through GDS subsets.

ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by Santosh Anand5.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 798 users visited in the last hour