getGEO function is a way of getting datasets and their annotations from GEO in R when we want to analyze microarray data. GEO2R uses this function to get datasets. I saw in the GEO2R scripts of many of GEO cases that the tool does not perform background correction and substitution of duplicates with mean values anymore. I thought maybe it should be because of the use of getGEO. In other words, whenever we obtain datasets of GEO by using getGEO function, the data are background corrected and duplicates are substituted with mean, is it true? If it is ok, So, if we want to have our analyze by R (without GEO2R), if we get datasets through this function there is no need for background correction and substitution of duplicates with the mean anymore?
Question: Is there a need for background correction and substitution of duplicates with the mean when we use the getGEO function of R in microarray analysis?
11 months ago by
Sib • 20
Sib • 20 wrote:
ADD COMMENT • link •
11 months ago by
dsull • 1.6k
dsull • 1.6k wrote:
OK, let's break this down:
- Say you have a GEO accession number:
GSE76427. You then go to look it up on the GEO website: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE76427
- As you see on the GEO website for that particular study, "Raw data for the entire 115 Singapore HCC cohort was obtained via the GenomeStudio; data have been normalized using the RSN method from R-package lumi."
- When you use
getGEO, you are using exactly what the authors have provided: The RSN-normalized data. Did they do anything about control probes? No (and note, if you look it up, there ARE control probes on this particular chip:
Illumina HumanHT-12 V4.0 expression beadchip). Did they do background correction? No. Did they do anything about replicate probes? No.
- Therefore, if you do you microarray analysis with
getGEO, you're doing it on their RSN-normalized data and you are NOT doing background correction or anything like that.
- Does one need to do background correction? There are certainly advantages and disadvantages of doing so. And there are LOTS of discussion on this, e.g. https://support.bioconductor.org/p/12547/
- What about replicate probes? Like I said, only probe-level data is provided (as usually is the case). If you want to summarize replicate probes to the gene-level, you'll have to do that yourself.
- If you are not happy with their normalization and want to do your own normalization, download their unnormalized data via the file GSE76427_non-normalized.txt.gz (supplied on the GEO website) and then you can do your own normalization.
- If you're using
GEO2Rbut you can't customize the differential gene expression analysis to your liking, download
limmaand write R scripts to perform the analysis yourself.
ADD COMMENT • link
Please log in to add an answer.
Powered by Biostar version 2.3.0
Traffic: 1476 users visited in the last hour