Question: Help with differential expression microarray data using oligo: adjusted p values are very high
0
gravatar for gina.abdelaal
5 months ago by
gina.abdelaal0 wrote:

Hello Biostars I have been using the following script for differential expression of affymetrix microarray data for several datasets, however I keep getting very high adjusted p values so I believe I am missing out a step. Apologies in advance for the very long post but I think it's important to show all my code.

 library(GEOquery)
library(limma)
library(oligo)
library(dplyr)
library(matrixStats)
library(biomaRt)

########## file download and normalisation####################################
setwd("~/Documents/Gina")

getGEOSuppFiles("GSE112841")

list.files("GSE112841")

untar("GSE112841/GSE112841_RAW.tar", exdir = "GSE112841/CEL")

celfiles <- list.files("GSE112841/CEL", full = TRUE)

rawData <- read.celfiles(celfiles)

normData <- rma(rawData)

###### filtering for poorly expressed probes##################################

data_medians <- rowMedians(exprs(normData))

man_threshold <- 4

no_samples <- table(man_threshold)

samples_cutoff <- min(no_samples)

idx_man_threshold <- apply(Biobase::exprs(normData), 1,
                          function(x){
                               sum(x > man_threshold) >= samples_cutoff})
table(idx_man_threshold)

filtered_normData <- subset(normData, idx_man_threshold)

dim(filtered_normData) # 24154 genes

dim(normData) # 53617 genes

###### gene annotation#######################################################

mart <- useMart("ENSEMBL_MART_ENSEMBL")

mart <- useDataset("hsapiens_gene_ensembl", mart)

Glist <- getBM(attributes = c("affy_hugene_2_0_st_v1",
                              "external_gene_name", 
                              "gene_biotype"), 
               filters = "affy_hugene_2_0_st_v1", 
               values =rownames(exprs(filtered_normData)),
               mart = mart)

Biomart <- Glist[which(Glist$gene_biotype=="protein_coding"),]

DF <- as.data.frame(exprs(filtered_normData))

DF$Probe_Name <- rownames(DF)

names(DF) <- c("shNDRG1_1", "shNDRG1_2", "shNDRG1_3", 
               "Control_1", "Control_2", "Control_3", 
               "Probe_Name")

Biomart <- Biomart[! duplicated(Biomart$external_gene_name),]

Biomart <- Biomart[! duplicated(Biomart$affy_hugene_2_0_st_v1),]

DF <- DF[! duplicated(DF$Probe_Name),]

X <- merge(DF, Biomart, by.x = "Probe_Name", by.y = "affy_hugene_2_0_st_v1")

rownames(X) <- X$external_gene_name

X$Probe_Name <- X$Probe_Name  <- X$gene_biotype <- X$external_gene_name <- NULL

########### differential expression##########################################
Z <- rep(c("shNDRG1", "Control"), each = 3)

f = factor(Z,levels=c("shNDRG1","Control"))

design = model.matrix(~ 0 + f)

colnames(design) = c("shNDRG1","Control")

data.fit = lmFit(X, design)

contrast.matrix = makeContrasts(shNDRG1-Control,levels=design)

data.fit.con = contrasts.fit(data.fit,contrast.matrix)

data.fit.eb = eBayes(data.fit.con)

tab = topTable(data.fit.eb, number=Inf,adjust.method="fdr")

    head(tab)


             logFC     AveExpr         t      P.Value adj.P.Val         B
VLDLR       1.1195324  7.216597  8.877460 4.023989e-05 0.4014946 -2.950292
CCL28       1.4088015  4.193678  8.015200 7.887233e-05 0.4014946 -2.995123
SERHL2     -1.0807675  7.651956 -7.503533 1.210243e-04 0.4014946 -3.027914
CASP14     -1.2332972  4.224030 -7.443782 1.274233e-04 0.4014946 -3.032104
EPHB6      -1.1213133  5.036059 -7.232237 1.533330e-04 0.4014946 -3.047611
SRARP      -1.3676657  5.549409 -7.226247 1.541483e-04 0.4014946 -3.048066
NDRG1      -1.1750064  7.820946 -7.224252 1.544210e-04 0.4014946 -3.048217

Any help with sorting out the adjusted p values will be most appreciated. Thanks.

limma microarray oligo R • 320 views
ADD COMMENTlink modified 5 months ago • written 5 months ago by gina.abdelaal0
3
gravatar for Kevin Blighe
5 months ago by
Kevin Blighe69k
Republic of Ireland
Kevin Blighe69k wrote:

Not much to say, really. If nothing passes FDR correction, then we cannot solve that. I do not see anything unusual with your code. Just double check various things such as the fact that Z should match perfectly with the columns of your input expression data.

If you want, please paste the first few entries from your results table.

ADD COMMENTlink written 5 months ago by Kevin Blighe69k

I have added in the results table.

ADD REPLYlink written 5 months ago by gina.abdelaal0
1

Thanks for adding. It looks like a genuine result - the absolute fold-changes are not even that high. Were you expecting something to be statistically significantly different?

If this is the genuine result based on this small 3 vs 3 comparison, then you'd require a larger cohort in order to find a statistically significantly different effect in this disease / condition of study.

ADD REPLYlink written 5 months ago by Kevin Blighe69k

Thank you for your response. I was hoping there would be some DE genes and that the gene knockdown of NDRG1 was successful. But it looks like the difference between the control and knockdown is not significant enough. I was just curious if it was my code or the data, and it seems like the latter was the issue. I will find a larger dataset .

ADD REPLYlink modified 5 months ago • written 5 months ago by gina.abdelaal0
1

Check that there are not other probes targeting this gene, which could hamper the statistical inferences made for it. For example, when you normalise, you can try other options with rma():

ADD REPLYlink written 5 months ago by Kevin Blighe69k

I have checked and I found only one probe for NDRG1

affy_hugene_2_0_st_v1 ensembl_gene_id external_gene_name
1              17081401 ENSG00000104419              NDRG1
                                                      description   gene_biotype
1 N-myc downstream regulated 1 [Source:HGNC Symbol;Acc:HGNC:7679] protein_coding
ADD REPLYlink modified 5 months ago • written 5 months ago by gina.abdelaal0
1

I see. Your options are limited, it seems. Depending on what you want to do with this data, you could report nominal p-values, but this would be extremely difficult to publish.

ADD REPLYlink written 5 months ago by Kevin Blighe69k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1695 users visited in the last hour
_