Question: The same Illumina expression data gathered by GEOquery and direct download shows different value after normalization?
0
gravatar for BioMed
2.7 years ago by
BioMed40
BioMed40 wrote:

Dear all,

I have one question that needs your help. Suppose that I need to process GSE39340 data set.

M1. I used GEOquery to get the data and normalized it by the below commands:

    library(GEOquery)
    library(lumi)
    eset <- getGEO("GSE39340")
    lumi.N.Q <- lumiExpresso(eset$GSE39340_series_matrix.txt.gz, normalize.param = list(method='rsn'))
    write.exprs(lumi.N.Q, file = 'processedExampledata.txt')

M2. I don't use GEOquery but instead downloaded the txt file directly from GEO (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE39340) and processed using the same method described below:

example.lumi <- lumiR("GSE39340_non_normalized.txt") 
lumi.A.B <- lumiExpresso(example.lumi, normalize.param = list(method='rsn'))
write.exprs(lumi.A.B, file = 'processedExampledata1.txt')

However, when comparing the output files, the expression values of the same probe/sample are quite different. For example, ILMN_1343295 of GSM966273 (aka E31) were 11.67 vs 11.80 in processedExampledata and processedExampledata1, respectively. I don't know why.

Please let me know where I get lost.

Thank you.

ADD COMMENTlink modified 2.7 years ago by andrew.j.skelton735.9k • written 2.7 years ago by BioMed40
1
gravatar for andrew.j.skelton73
2.7 years ago by
London
andrew.j.skelton735.9k wrote:

GEO generally holds a normalised and "raw" version of the data. I suspect that the getGEO function is by default pulling down the pre-normalised matrix, and that's why you're seeing the differences.

If you want to accurately reproduce the author's normalisation strategy, then it's often best to get in direct contact with them. You can however look for clues, such as this included in one of the sample's metadata:

The data were normalised using quantile normalisation with Illumina Genomestudio V2011.1 and gene expression module (1.9.0).

ADD COMMENTlink written 2.7 years ago by andrew.j.skelton735.9k

Thank you very much for pointing it out. We should be careful when using getGEO when gathering Illumina arrays then.

ADD REPLYlink written 2.7 years ago by BioMed40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1418 users visited in the last hour