Question

Processing Illumina HumanHT-12 V4.0 data without detection p-values

2

Entering edit mode

2.1 years ago

vinaysbharadhwaj ▴ 40

Hello, I want to pre-process GSE17048 dataset from GEO. I followed the steps provided by Kevin in this post but I realized that neqc from limma requires the detection p-values of each probe for normalizing the data. I wanted to know if there was a way to background correct and normalize the data without using the pvalues.

Thank you.

Illumina HumanHT-12v4 pre-processing limma • 1.0k views

ADD COMMENT • link 2.1 years ago by vinaysbharadhwaj ▴ 40

score 5 · Accepted Answer · 2022-03-04

5

Entering edit mode

2.1 years ago

Gordon Smyth ★ 7.0k

The group that deposited the data hasn't done you any favors. Without either the control probes or the detection p-values, it is not possible to use limma's neqc method. But you can use the normexp background correction method instead:

> library(limma)
> y.raw <- read.delim("GSE17048_non-normalized_data.txt.gz",row.names=1)
> y.bgcorrected <- backgroundCorrect(y.raw, method="normexp")
> y.norm <- normalizeBetweenArrays(log2(y.bgcorrected), method="quantile")

ADD COMMENT • link 2.1 years ago by Gordon Smyth ★ 7.0k

0

Entering edit mode

I see! Thank you. I have a lot of datasets that have a similar issue. Could you please elaborate on what would be different if the data had control probes and no p-values and how to identify the control probes? Would I just use the Illumina annotation files and check if control probes are present?

Also, I have not really looked into the neqc code, so could you please explain what is the main difference between neqc and just using these 2 functions?

ADD REPLY • link 2.1 years ago by vinaysbharadhwaj ▴ 40

1

Entering edit mode

Illumina software writes the control probe intensities to a separate file, which unfortunately researchers seldom upload to GEO, although they should.

Sometimes people will provide the Illumina idat file, which is the binary version of the intensity data. In that case you can use read.idat to get intensities and detection p-values

To understand what neqc does, type help("neqc") and read the reference that is listed. Basically neqc uses the negative control probes to estimate the normal part of the normal + exponential convolution whereas normexp has to estimate the whole convolution directly without being guided by control probes.