Processing Illumina HumanHT-12 V4.0 data without detection p-values
1
2
Entering edit mode
2.1 years ago

Hello, I want to pre-process GSE17048 dataset from GEO. I followed the steps provided by Kevin in this post but I realized that neqc from limma requires the detection p-values of each probe for normalizing the data. I wanted to know if there was a way to background correct and normalize the data without using the pvalues.

Thank you.

Illumina HumanHT-12v4 pre-processing limma • 1.0k views
ADD COMMENT
5
Entering edit mode
2.1 years ago
Gordon Smyth ★ 7.0k

The group that deposited the data hasn't done you any favors. Without either the control probes or the detection p-values, it is not possible to use limma's neqc method. But you can use the normexp background correction method instead:

> library(limma)
> y.raw <- read.delim("GSE17048_non-normalized_data.txt.gz",row.names=1)
> y.bgcorrected <- backgroundCorrect(y.raw, method="normexp")
> y.norm <- normalizeBetweenArrays(log2(y.bgcorrected), method="quantile")
ADD COMMENT
0
Entering edit mode

I see! Thank you. I have a lot of datasets that have a similar issue. Could you please elaborate on what would be different if the data had control probes and no p-values and how to identify the control probes? Would I just use the Illumina annotation files and check if control probes are present?

Also, I have not really looked into the neqc code, so could you please explain what is the main difference between neqc and just using these 2 functions?

ADD REPLY
1
Entering edit mode

Illumina software writes the control probe intensities to a separate file, which unfortunately researchers seldom upload to GEO, although they should.

Sometimes people will provide the Illumina idat file, which is the binary version of the intensity data. In that case you can use read.idat to get intensities and detection p-values

To understand what neqc does, type help("neqc") and read the reference that is listed. Basically neqc uses the negative control probes to estimate the normal part of the normal + exponential convolution whereas normexp has to estimate the whole convolution directly without being guided by control probes.

ADD REPLY
0
Entering edit mode

I see that is very helpful and enlightening. Thank you for the explanation.

ADD REPLY

Login before adding your answer.

Traffic: 2773 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6