Question: Help with differential expression microarray data using oligo: adjusted p values are very high
0
gravatar for gina.abdelaal
13 days ago by
gina.abdelaal0 wrote:

Hello Biostars I have been using the following script for differential expression of affymetrix microarray data for several datasets, however I keep getting very high adjusted p values so I believe I am missing out a step. Apologies in advance for the very long post but I think it's important to show all my code.

 library(GEOquery)
library(limma)
library(oligo)
library(dplyr)
library(matrixStats)
library(biomaRt)

########## file download and normalisation####################################
setwd("~/Documents/Gina")

getGEOSuppFiles("GSE112841")

list.files("GSE112841")

untar("GSE112841/GSE112841_RAW.tar", exdir = "GSE112841/CEL")

celfiles <- list.files("GSE112841/CEL", full = TRUE)

rawData <- read.celfiles(celfiles)

normData <- rma(rawData)

###### filtering for poorly expressed probes##################################

data_medians <- rowMedians(exprs(normData))

man_threshold <- 4

no_samples <- table(man_threshold)

samples_cutoff <- min(no_samples)

idx_man_threshold <- apply(Biobase::exprs(normData), 1,
                          function(x){
                               sum(x > man_threshold) >= samples_cutoff})
table(idx_man_threshold)

filtered_normData <- subset(normData, idx_man_threshold)

dim(filtered_normData) # 24154 genes

dim(normData) # 53617 genes

###### gene annotation#######################################################

mart <- useMart("ENSEMBL_MART_ENSEMBL")

mart <- useDataset("hsapiens_gene_ensembl", mart)

Glist <- getBM(attributes = c("affy_hugene_2_0_st_v1",
                              "external_gene_name", 
                              "gene_biotype"), 
               filters = "affy_hugene_2_0_st_v1", 
               values =rownames(exprs(filtered_normData)),
               mart = mart)

Biomart <- Glist[which(Glist$gene_biotype=="protein_coding"),]

DF <- as.data.frame(exprs(filtered_normData))

DF$Probe_Name <- rownames(DF)

names(DF) <- c("shNDRG1_1", "shNDRG1_2", "shNDRG1_3", 
               "Control_1", "Control_2", "Control_3", 
               "Probe_Name")

Biomart <- Biomart[! duplicated(Biomart$external_gene_name),]

Biomart <- Biomart[! duplicated(Biomart$affy_hugene_2_0_st_v1),]

DF <- DF[! duplicated(DF$Probe_Name),]

X <- merge(DF, Biomart, by.x = "Probe_Name", by.y = "affy_hugene_2_0_st_v1")

rownames(X) <- X$external_gene_name

X$Probe_Name <- X$Probe_Name  <- X$gene_biotype <- X$external_gene_name <- NULL

########### differential expression##########################################
Z <- rep(c("shNDRG1", "Control"), each = 3)

f = factor(Z,levels=c("shNDRG1","Control"))

design = model.matrix(~ 0 + f)

colnames(design) = c("shNDRG1","Control")

data.fit = lmFit(X, design)

contrast.matrix = makeContrasts(shNDRG1-Control,levels=design)

data.fit.con = contrasts.fit(data.fit,contrast.matrix)

data.fit.eb = eBayes(data.fit.con)

tab = topTable(data.fit.eb, number=Inf,adjust.method="fdr")

    head(tab)


             logFC     AveExpr         t      P.Value adj.P.Val         B
VLDLR       1.1195324  7.216597  8.877460 4.023989e-05 0.4014946 -2.950292
CCL28       1.4088015  4.193678  8.015200 7.887233e-05 0.4014946 -2.995123
SERHL2     -1.0807675  7.651956 -7.503533 1.210243e-04 0.4014946 -3.027914
CASP14     -1.2332972  4.224030 -7.443782 1.274233e-04 0.4014946 -3.032104
EPHB6      -1.1213133  5.036059 -7.232237 1.533330e-04 0.4014946 -3.047611
SRARP      -1.3676657  5.549409 -7.226247 1.541483e-04 0.4014946 -3.048066
NDRG1      -1.1750064  7.820946 -7.224252 1.544210e-04 0.4014946 -3.048217

Any help with sorting out the adjusted p values will be most appreciated. Thanks.

limma microarray oligo R • 146 views
ADD COMMENTlink modified 12 days ago • written 13 days ago by gina.abdelaal0
3
gravatar for Kevin Blighe
13 days ago by
Kevin Blighe63k
Kevin Blighe63k wrote:

Not much to say, really. If nothing passes FDR correction, then we cannot solve that. I do not see anything unusual with your code. Just double check various things such as the fact that Z should match perfectly with the columns of your input expression data.

If you want, please paste the first few entries from your results table.

ADD COMMENTlink written 13 days ago by Kevin Blighe63k

I have added in the results table.

ADD REPLYlink written 12 days ago by gina.abdelaal0
1

Thanks for adding. It looks like a genuine result - the absolute fold-changes are not even that high. Were you expecting something to be statistically significantly different?

If this is the genuine result based on this small 3 vs 3 comparison, then you'd require a larger cohort in order to find a statistically significantly different effect in this disease / condition of study.

ADD REPLYlink written 12 days ago by Kevin Blighe63k

Thank you for your response. I was hoping there would be some DE genes and that the gene knockdown of NDRG1 was successful. But it looks like the difference between the control and knockdown is not significant enough. I was just curious if it was my code or the data, and it seems like the latter was the issue. I will find a larger dataset .

ADD REPLYlink modified 12 days ago • written 12 days ago by gina.abdelaal0
1

Check that there are not other probes targeting this gene, which could hamper the statistical inferences made for it. For example, when you normalise, you can try other options with rma():

ADD REPLYlink written 12 days ago by Kevin Blighe63k

I have checked and I found only one probe for NDRG1

affy_hugene_2_0_st_v1 ensembl_gene_id external_gene_name
1              17081401 ENSG00000104419              NDRG1
                                                      description   gene_biotype
1 N-myc downstream regulated 1 [Source:HGNC Symbol;Acc:HGNC:7679] protein_coding
ADD REPLYlink modified 11 days ago • written 11 days ago by gina.abdelaal0
1

I see. Your options are limited, it seems. Depending on what you want to do with this data, you could report nominal p-values, but this would be extremely difficult to publish.

ADD REPLYlink written 11 days ago by Kevin Blighe63k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1641 users visited in the last hour