Question: MIcroarray data analysis using limma

0

banerjeeshayantan •

**170**wrote:I am trying to analyze this GEO dataset. (GSE66099). There are two groups of patients that I am interested in finding the DEGs. I am attaching my code below. Kindly comment on the code if something is wrong.

Assume that I have normalized the data and it is available in a dataframe called newrma

```
arr=numeric(276) #I have 276 samples in total
arr[which(clinical_data$Phenotype1=="YES")]="YES"
arr[which(clinical_data$Phenotype=="NO")]="NO"
arr[which(arr=="0")]<-"X"
arr[which(arr=="YES")]<-"1"
arr[which(arr=="NO")]<-"0"
gsms=paste(arr, collapse = "")
sml <- c()
for (i in 1:nchar(gsms)) { sml[i] <- substr(gsms,i,i) }
#removing all 0s from the arr list
sel <- which(sml != "X")
sml <- sml[sel]
#the first four columns of newrma have probeid, entrez gene id, gene symbol etc
x=newrma[,5:length(colnames(newrma))][,sel]
sml <- paste("G", sml, sep="") # set group names
fl <- as.factor(sml)
g=t(x)
g$description <- fl
arr=arr[which(arr=="0" | arr=="1")]
design <- model.matrix(~0+factor(c(arr)))
colnames(design) <- levels(fl)
x1=newrma[,5:length(colnames(newrma))][,sel]
x1[] <- lapply(x1, function(x) as.numeric(as.character(x)))
fit <- lmFit(x1, design)
contrast.matrix <- makeContrasts(diff=G1-G0, levels=design)
fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)
tT= topTable(fit2, number=250, p.value = 0.05, adjust.method = "fdr", sort.by="logFC")
#mapping the probenames to gene names
tT=cbind(tT,Gene_name=newSymbols[match(rownames(tT),newSymbols$PROBEID),4])
#tT=tT[!duplicated(tT$ID),]
#sorting by adjusted p value
tT_pvalue=tT[order(tT$adj.P.Val,decreasing = FALSE),]
#considering a log fold change of > 1
tT_signi_up_down=subset(tT_pvalue, tT_pvalue$logFC > 1 | tT_pvalue$logFC < -1)
```

I am really not sure that I am doing the steps correctly. Can you please review it and give your feedback?

I do not see anything majorly incorrect about your code. Why are you asking? Do you really need the following line:

?

65kthe x1 dataframe had all the expression values as factor variables. Hence the need for conversion. I exactly do not know when the columns of x dataframe became all factors.

170