Question

Is there a need for background correction and substitution of duplicates with the mean when we use the getGEO function of R in microarray analysis?

0

Entering edit mode

4.3 years ago

Sib ▴ 60

getGEO function is a way of getting datasets and their annotations from GEO in R when we want to analyze microarray data. GEO2R uses this function to get datasets. I saw in the GEO2R scripts of many of GEO cases that the tool does not perform background correction and substitution of duplicates with mean values anymore. I thought maybe it should be because of the use of getGEO. In other words, whenever we obtain datasets of GEO by using getGEO function, the data are background corrected and duplicates are substituted with mean, is it true? If it is ok, So, if we want to have our analyze by R (without GEO2R), if we get datasets through this function there is no need for background correction and substitution of duplicates with the mean anymore?

R GEO Microarray analysis • 1.3k views

ADD COMMENT • link 4.3 years ago by Sib ▴ 60

0

Entering edit mode

Not sure if this is sufficiently different from In R scripts of GEO2R which line is responsible for background correction and replacing replicated probes with the mean? to be an independent question, where this question has essentially been answered already been answered. GetGEO gives you what the authors have uploaded. What this exactly is you have to check in either the methods next or metadata provided with the dataset.

Read also through the answer of Sean Davis at https://support.bioconductor.org/p/31117/ towards preprocessed and raw data.

ADD REPLY • link 4.3 years ago by ATpoint 81k

0

Entering edit mode

No these are two different questions. I want to analyze microarray data in R. Now I want to know if I get the datasets by getGEO I should perform background correction and substitution of duplicates with the mean or like GEO2R not.

ADD REPLY • link 4.3 years ago by Sib ▴ 60

0

Entering edit mode

GetGEO gives you what the authors have uploaded. What this exactly is you have to check in either the methods next or metadata provided with the dataset.

...means if you have to do that depends on if the authors have already done that and then provided it to GEO (which is then what GetGEO gives you) or not. Check method section and metadata.

ADD REPLY • link 4.3 years ago by ATpoint 81k

0

Entering edit mode

Where are the method section and metadata?

ADD REPLY • link 4.3 years ago by Sib ▴ 60

score 0 · Answer 1 · 2020-01-05

OK, let's break this down:

Say you have a GEO accession number: GSE76427. You then go to look it up on the GEO website: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE76427
As you see on the GEO website for that particular study, "Raw data for the entire 115 Singapore HCC cohort was obtained via the GenomeStudio; data have been normalized using the RSN method from R-package lumi."
When you use getGEO, you are using exactly what the authors have provided: The RSN-normalized data. Did they do anything about control probes? No (and note, if you look it up, there ARE control probes on this particular chip: Illumina HumanHT-12 V4.0 expression beadchip). Did they do background correction? No. Did they do anything about replicate probes? No.
Therefore, if you do you microarray analysis with getGEO, you're doing it on their RSN-normalized data and you are NOT doing background correction or anything like that.
Does one need to do background correction? There are certainly advantages and disadvantages of doing so. And there are LOTS of discussion on this, e.g. https://support.bioconductor.org/p/12547/
What about replicate probes? Like I said, only probe-level data is provided (as usually is the case). If you want to summarize replicate probes to the gene-level, you'll have to do that yourself.
If you are not happy with their normalization and want to do your own normalization, download their unnormalized data via the file GSE76427_non-normalized.txt.gz (supplied on the GEO website) and then you can do your own normalization.
If you're using GEO2R but you can't customize the differential gene expression analysis to your liking, download limma and write R scripts to perform the analysis yourself.