**30**wrote:

Hello,

I am wondering about the data analysis strategy implemented to calculate the differential expression analysis between wildtype and knockout samples for 2 cell lines. It is a Illumina bead array chip data obtained from [GEO].
I have total of 4 samples for each cell line (2 controls, 2 treated).
End output I would like to obtain is foldchange and pvalues; for comparisons done between controls and treated samples from 2 cell lines.
*Output: B_vs_A_log2FC B_vs_A_pvalue D_vs_C_log2FC D_vs_C_pvalue*

I have my code (below), that could perform my task.

But I am wondering about 3 questions:

1) Is it correct to apply *neqc* function to normalize the expression values of .idat files

2) My design matrix without weights?

3) As both cell lines contains 2 replicates, is this limma calculation makes sense? and the reliability of differential expression values (pvalues and foldchanges)

Any valuable inputs or suggestions?

```
datadir <- 'GSExxx_RAWdata/'
idatfiles <- dir(path=datadir, pattern=".idat")
bgxfile <- dir(path=datadir, pattern=".bgx")
x <- read.idat(paste(datadir,idatfiles,sep=""), paste(datadir,bgxfile,sep = ""))
x$other$Detection <- detectionPValues(x)
**y <- neqc(x)**
## eliminating low detection probes
isexpr <- rowSums(y$other$Detection < 0.05) >= 3
y <- y[isexpr,]
data <- y$E
## add gene symbols
rownames(data) <- y$genes$Symbol
colnames(data) <- c('MD_NC_rep1','MD_NC_rep2',
'MD_siPK1_rep1','MD_siPK1_rep2',
'HT_NC_rep1','HT_NC_rep2',
'HT_siPK1_rep1','HT_siPK1_rep2')
head(data)
i=1; j=i+3; k=j+1; l=k+3
data_MD <- data[,i:j]
data_HT <- data[,k:l]
**design <- model.matrix(~0+factor(c(0,0,1,1)))**
colnames(design) <- c("Wild_type", "Mutant")
# > design
# Wild_type Mutant
# 1 1 0
# 2 1 0
# 3 0 1
# 4 0 1
# attr(,"assign")
# [1] 1 1
# attr(,"contrasts")
# attr(,"contrasts")$`factor(c(0, 0, 1, 1))`
# [1] "contr.treatment"
**cont_matrix <- makeContrasts(MutvsWT = Mutant-Wild_type, levels=design)**
# > cont_matrix
# Contrasts
# Levels MutvsWT
# Wild_type -1
# Mutant 1
op.matrix <- data
for (i in 1:2){
if (i==1){data=data_MD; Nm="MD_siPK1_vs_MD_NC"; tg=targets[c(1:4),]}
if (i==2){data=data_HT; Nm="HT_siPK1_vs_HT_NC"; tg=targets[c(5:8),]}
fit <- lmFit(data, design)
fit_contrast <- contrasts.fit(fit, cont_matrix)
fit_contrast <- eBayes(fit_contrast)
# Generate a list of top 100 differentially expressed genes
output <- topTreat(fit_contrast, coef=1, number=Inf, adjust.method="BH")
p1 <- output[,"P.Value"]
FC <- output[,"logFC"]
q1 <- p.adjust(p1, method="fdr");
p1 <- signif(p1, digits=2);
q1 <- signif(q1, digits=2);
FC <- round(FC, digits=2) ;
P.matrix <- cbind.data.frame(op.matrix, FC, p1, q1);
colnames(op.matrix)[(ncol(op.matrix)-2):ncol(op.matrix)] <- c(
paste(Nm, "-log2FC", sep=""),
paste(Nm, "-p", sep=""),
paste(Nm, "-FDR", sep=""));
}
op.matrix <- cbind.data.frame('Gene'=rownames(op.matrix),op.matrix)
```