Question: I'm facing a problem to analyze microarray data from Illumina HumanHT-12
0
gravatar for Leite
11 months ago by
Leite370
Leite370 wrote:

Hello everyone,

I'm facing a problem to analyze microarray data from Illumina HumanHT-12 from public databases such as E-MTAB-5273 and GSE54514.

E-MTAB-5273 have two files E-MTAB-5273.raw.1.zip and E-MTAB-5273.processed.1.zip both .txt,

GSE54514 also have two filesGSE54514_RAW.tar and GSE54514_non-normalized.txt.gz - one .bgx and other .txt respectively.

First question: E-MTAB-5273.raw file represents non-normalized data? While processed file, represents normalized?

Second question: What is the best way to analyze non-normalized Illumina HumanHT-12?

Best regards,

Leite

illumina humanht-12 R • 383 views
ADD COMMENTlink written 11 months ago by Leite370

Dears colleagues,

I found some answers:

First question: E-MTAB-5273.raw file represents non-normalized data? While processed file, represents normalized? Yes the .raw file in E-MTAB represent non-normalized, as well as processed is the normalized file loaded in this database. So, .bgf is a manifest file ( "Describe the contents of each microarray, including the probe names and sequences among many other things").

Second question: What is the best way to analyze non-normalized Illumina HumanHT-12?

#read in the expression profiles
x <- read.ilmn("Burnham_sepsis_discovery_raw_237.txt", probeid="PROBE_ID", other.columns="detection")

#Background correction and Normalization
y <- neqc(x)
dim(y)

My question is how to tell R which are controls and patients to then do the design matrix and find the DEGs?

ADD REPLYlink written 11 months ago by Leite370
1

You now just have to perform differential expression analysis on the normalised log2 intensities contained in the y object. To determine which samples are patients and controls, just consult the metadata. Fo example, the information on patients and controls can be found here for GSE54514: https://www.ncbi.nlm.nih.gov/geo/geo2r/?acc=GSE54514

To implement this in R, you just have to create the model matrix and ensure that the model matrix rows correspond to your Expression Set columns.

As both studies used the same microarray type, you can merge the raw data files together and then normalise them together. You may notice a batch effect, but you can adjust for this in the limma design model.

ADD REPLYlink written 11 months ago by Kevin Blighe41k

Dear Kevin,

I found it's answer in this post https://support.bioconductor.org/p/92834/ by Gordon Smyth, but I still don't understand how he did to say what samples are"controls" and are "patients".

> library(limma)
> x <- read.ilmn("GSE74629_non-normalized.txt",expr="SAMPLE ",probeid="ID_REF")
Reading file GSE74629_non-normalized.txt ... ...
> y <- neqc(x)
Note: inferring mean and variance of negative control probe intensities from the
detection p-values.
> Group <- rep(c("PDAC","Healthy"),c(36,14))
> Group <- factor(Group)
> design <- model.matrix(~Group)
> keep <- rowSums(y$E>5) >= 14
> y2 <- y[keep,]
> fit <- lmFit(y2,design)
> fit <- eBayes(fit,trend=TRUE,robust=TRUE)
> topTable(fit,coef=2)

Best, Leite

ADD REPLYlink written 11 months ago by Leite370
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1464 users visited in the last hour