Question

Help with differential expression microarray data using oligo: adjusted p values are very high

0

Entering edit mode

3.7 years ago

gina.abdelaal • 0

Hello Biostars I have been using the following script for differential expression of affymetrix microarray data for several datasets, however I keep getting very high adjusted p values so I believe I am missing out a step. Apologies in advance for the very long post but I think it's important to show all my code.

 library(GEOquery)
library(limma)
library(oligo)
library(dplyr)
library(matrixStats)
library(biomaRt)

########## file download and normalisation####################################
setwd("~/Documents/Gina")

getGEOSuppFiles("GSE112841")

list.files("GSE112841")

untar("GSE112841/GSE112841_RAW.tar", exdir = "GSE112841/CEL")

celfiles <- list.files("GSE112841/CEL", full = TRUE)

rawData <- read.celfiles(celfiles)

normData <- rma(rawData)

###### filtering for poorly expressed probes##################################

data_medians <- rowMedians(exprs(normData))

man_threshold <- 4

no_samples <- table(man_threshold)

samples_cutoff <- min(no_samples)

idx_man_threshold <- apply(Biobase::exprs(normData), 1,
                          function(x){
                               sum(x > man_threshold) >= samples_cutoff})
table(idx_man_threshold)

filtered_normData <- subset(normData, idx_man_threshold)

dim(filtered_normData) # 24154 genes

dim(normData) # 53617 genes

###### gene annotation#######################################################

mart <- useMart("ENSEMBL_MART_ENSEMBL")

mart <- useDataset("hsapiens_gene_ensembl", mart)

Glist <- getBM(attributes = c("affy_hugene_2_0_st_v1",
                              "external_gene_name", 
                              "gene_biotype"), 
               filters = "affy_hugene_2_0_st_v1", 
               values =rownames(exprs(filtered_normData)),
               mart = mart)

Biomart <- Glist[which(Glist$gene_biotype=="protein_coding"),]

DF <- as.data.frame(exprs(filtered_normData))

DF$Probe_Name <- rownames(DF)

names(DF) <- c("shNDRG1_1", "shNDRG1_2", "shNDRG1_3", 
               "Control_1", "Control_2", "Control_3", 
               "Probe_Name")

Biomart <- Biomart[! duplicated(Biomart$external_gene_name),]

Biomart <- Biomart[! duplicated(Biomart$affy_hugene_2_0_st_v1),]

DF <- DF[! duplicated(DF$Probe_Name),]

X <- merge(DF, Biomart, by.x = "Probe_Name", by.y = "affy_hugene_2_0_st_v1")

rownames(X) <- X$external_gene_name

X$Probe_Name <- X$Probe_Name  <- X$gene_biotype <- X$external_gene_name <- NULL

########### differential expression##########################################
Z <- rep(c("shNDRG1", "Control"), each = 3)

f = factor(Z,levels=c("shNDRG1","Control"))

design = model.matrix(~ 0 + f)

colnames(design) = c("shNDRG1","Control")

data.fit = lmFit(X, design)

contrast.matrix = makeContrasts(shNDRG1-Control,levels=design)

data.fit.con = contrasts.fit(data.fit,contrast.matrix)

data.fit.eb = eBayes(data.fit.con)

tab = topTable(data.fit.eb, number=Inf,adjust.method="fdr")

    head(tab)


             logFC     AveExpr         t      P.Value adj.P.Val         B
VLDLR       1.1195324  7.216597  8.877460 4.023989e-05 0.4014946 -2.950292
CCL28       1.4088015  4.193678  8.015200 7.887233e-05 0.4014946 -2.995123
SERHL2     -1.0807675  7.651956 -7.503533 1.210243e-04 0.4014946 -3.027914
CASP14     -1.2332972  4.224030 -7.443782 1.274233e-04 0.4014946 -3.032104
EPHB6      -1.1213133  5.036059 -7.232237 1.533330e-04 0.4014946 -3.047611
SRARP      -1.3676657  5.549409 -7.226247 1.541483e-04 0.4014946 -3.048066
NDRG1      -1.1750064  7.820946 -7.224252 1.544210e-04 0.4014946 -3.048217

Any help with sorting out the adjusted p values will be most appreciated. Thanks.

R microarray oligo limma • 1.3k views

ADD COMMENT • link 3.7 years ago by gina.abdelaal • 0

score 3 · Answer 1 · 2020-07-30

3

Entering edit mode

3.7 years ago

Kevin Blighe 87k

Not much to say, really. If nothing passes FDR correction, then we cannot solve that. I do not see anything unusual with your code. Just double check various things such as the fact that Z should match perfectly with the columns of your input expression data.

If you want, please paste the first few entries from your results table.

ADD COMMENT • link 3.7 years ago by Kevin Blighe 87k

0

Entering edit mode

I have added in the results table.

ADD REPLY • link 3.7 years ago by gina.abdelaal • 0

1

Entering edit mode

Thanks for adding. It looks like a genuine result - the absolute fold-changes are not even that high. Were you expecting something to be statistically significantly different?

If this is the genuine result based on this small 3 vs 3 comparison, then you'd require a larger cohort in order to find a statistically significantly different effect in this disease / condition of study.

ADD REPLY • link 3.7 years ago by Kevin Blighe 87k

0

Entering edit mode

Thank you for your response. I was hoping there would be some DE genes and that the gene knockdown of NDRG1 was successful. But it looks like the difference between the control and knockdown is not significant enough. I was just curious if it was my code or the data, and it seems like the latter was the issue. I will find a larger dataset .

ADD REPLY • link 3.7 years ago by gina.abdelaal • 0

1

Entering edit mode

Check that there are not other probes targeting this gene, which could hamper the statistical inferences made for it. For example, when you normalise, you can try other options with rma():

ADD REPLY • link 3.7 years ago by Kevin Blighe 87k

0

Entering edit mode

I have checked and I found only one probe for NDRG1

affy_hugene_2_0_st_v1 ensembl_gene_id external_gene_name
1              17081401 ENSG00000104419              NDRG1
                                                      description   gene_biotype
1 N-myc downstream regulated 1 [Source:HGNC Symbol;Acc:HGNC:7679] protein_coding

ADD REPLY • link 3.7 years ago by gina.abdelaal • 0

1

Entering edit mode

I see. Your options are limited, it seems. Depending on what you want to do with this data, you could report nominal p-values, but this would be extremely difficult to publish.

ADD REPLY • link 3.7 years ago by Kevin Blighe 87k