Question: edgeR RNAseq getting no differentially expressed genes when l compare my groups
0
gravatar for mbanoian
2.0 years ago by
mbanoian0
mbanoian0 wrote:

Hi All, l am at the beginning of my RNAseq journey and l have bumped into my first major snag.

I have adopted the methods that were presented in the publication : RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR [version 1; referees: 3 approved].

I have (3)control, (7)TBHIV, (4)TB,(5)HIV patient tissue.

I ran the script below and l keep on getting no differentially expressed genes when l perform a decideTests. :

RNAseq script

rawdata <- read.table("SteynRawCounts_Filtered_outliersremoved.txt", header=TRUE, sep="")
colnames(rawdata) <- c("Genes", rep("TBHIV",4),"HIV",rep("TB",2),"TBHIV",rep("HIV",2),"TBHIV","HIV","TB","HIV","TBHIV","TB",rep("Control",3))
library(ggplot2)
library(grid)
library(gridExtra)
library(edgeR)
y <- DGEList(counts=rawdata[,2:20], genes=rawdata[,1])
lcpm <- cpm(y, log=TRUE)
thresh <- cpm(y) > 0.5
keep <- rowSums(thresh) >= 4
y <- y[keep,]
plotMDS(y)

# recalculate library size and calculate normalization factors
y$samples$lib.size <- colSums(y$counts)
y <- calcNormFactors(y, method="TMM")
y <- estimateDisp(y, design, robust=TRUE)
plotMDS(y)

#Preparing for PCA and removing counts with zero variance
x <- y$counts
vars <- apply(y$counts, 1, var)
keep <- vars > 0
x <- x[keep,]
x <- cpm(x, log=T, prior.count=3)
samplenames <- as.factor(c(rep("TBHIV",4),"HIV",rep("TB",2),"TBHIV",rep("HIV",2),"TBHIV","HIV","TB","HIV","TBHIV","TB",rep("Control",3)))
colnames(x) <- samplenames

## pca
pca <- prcomp(t(x), scale.=TRUE)
plot(pca$x[,1], pca$x[,2], pch=19, col=samplenames)
#prepare the model design
diagnosis <- factor(c(rep("TBHIV",4),"HIV",rep("TB",2),"TBHIV",rep("HIV",2),"TBHIV","HIV","TB","HIV","TBHIV","TB",rep("Control",3)))
pathology <- factor(c("A","C","A","C","A","A","C","A","A","C","C","A","C","C","B","A","C","C","C"))
design <- model.matrix(~0+diagnosis)
colnames(design) <- gsub("diagnosis","",colnames(design))

contr.matrix <- makeContrasts(
  ControlvsHIV = HIV-Control,
  ControlvsTB = TB-Control,
  ControlvsTBHIV = TBHIV-Control,
  TBHIVvsHIV = HIV-TBHIV,
  TBHIVvsTB = TB-TBHIV,
  HIVvsTB = TB-HIV,
  levels = colnames(design))

v <- voom(y, design, plot=TRUE)
vfit <- lmFit(v,design)
vfit <- contrasts.fit(vfit, contrasts=contr.matrix)
efit <- eBayes(vfit)
plotSA(efit)
summary(decideTests(efit))

thank you for any pointers.

rna-seq R • 1.4k views
ADD COMMENTlink modified 2.0 years ago by Devon Ryan94k • written 2.0 years ago by mbanoian0

could you reformat that and keep to the relevant bits please; there seems to be multiple contrast matrices for example

ADD REPLYlink written 2.0 years ago by russhh5.2k

Hi russhh, I have re-formatted it just now. It's just a large block of code, as you can see. I don't have time to look over it in detail right now.

ADD REPLYlink written 2.0 years ago by Kevin Blighe56k

Could you confirm how and why you removed outliers and filtered the samples when you set the dataset up, please.

ADD REPLYlink written 2.0 years ago by russhh5.2k

Hi rush, l think the outliers were removed because we were not happy with the quality of the data. Sorry for the multiple contrasts, l will sort that out now.

ADD REPLYlink written 2.0 years ago by mbanoian0

Do you have cross-contrast significance for any of your features? That is, if you run topTable() on efit without specifying a particular contrast do you have any hits at adj.P < your threshold. To my mind, you've jumped into single-contrast tests too early. (aside: you might want to add library(limma) at the start of your script). If you get no hits, or very few hits, could you plot logFC versus AveCPM for each of your contrasts and see if there's any systematic biases that are wiping out any hint of differential expression.

ADD REPLYlink written 2.0 years ago by russhh5.2k

Hi rush (sorry my autocorrect keeps changing your name). Thank for you responses. I have been doing a bit of backtracking and l have noticed that the issue might be to do with the way l annotated the genes in DGEList.

The paper l am emulating used this code to annotate the genes: library(Mus.musculus) geneid <- rownames(x) genes <- select(Mus.musculus, keys=geneid, columns=c("SYMBOL", "TXCHROM"),head(genes),keytype="ENTREZID") x$genes <- genes

when l tried to do this it is using the initial gene lists (before removal of under expressed genes) i.e saying that genes has 56355 opposed to 23154 after processing. This is the code l used.

y1 <- y[keep,] genes <- rawdata_new[,1] y1$genes <- genes y1

ADD REPLYlink written 2.0 years ago by mbanoian0

Hi rush, l performed a fit using this code:

Read In Data

rawdata <- read.table("Trimmed Groups HIVTB comparison1.txt", header=TRUE, sep="") names(rawdata) group <- as.factor(c(rep("TBHIV",5),"TB","TB","TBHIV",rep("TB",4))) design <- model.matrix(~0+group) colnames(design) <- gsub("group", "", colnames(design)) design contr.matrix <- makeContrasts(TB.vs.TBHIV = TB-TBHIV, levels = colnames(design)) y <- DGEList(counts=rawdata[,2:13], genes=rawdata[,1],group=group) cpm <- cpm(y) lcpm <- cpm(y, log=TRUE) thresh <- cpm(y) > 0.5 keep <- rowSums(thresh) >= 2 dim(y) y <- y[keep,] dim(y) y <- calcNormFactors(y)

y <- estimateGLMCommonDisp(y,design) names(y) y$common.dispersion sqrt(y$common.dispersion) y <- estimateGLMTrendedDisp(y,design) summary(y$trended.dispersion) y <- estimateGLMTagwiseDisp(y,design) y$prior.df getPriorN(y,design) head(y$tagwise.dispersion) summary(y$tagwise.dispersion)

Generalized Linear Model Fit

glmfit.tgw <- glmFit(y,design,contrast=contr.matrix) lrt.tgw <- glmLRT(glmfit.tgw) topTags(lrt.tgw) summary(decideTestsDGE(lrt.tgw))

this was the out l got:

Coefficient: TBHIV genes logFC logCPM LR PValue FDR 53 KDM1A -14.38859 5.587484 1135.152 7.566662e-249 1.429321e-244 11952 DCTN5 -14.38519 5.657110 1134.195 1.221537e-248 1.429321e-244 7781 C1RL -13.99384 6.045328 1128.796 1.820943e-247 1.420457e-243 2485 CSNK2A1 -14.33314 5.615519 1127.431 3.604872e-247 2.109030e-243 5134 SPOP -13.88495 6.002822 1126.798 4.948748e-247 2.316212e-243 7181 EIF4E2 -14.45320 5.537715 1125.427 9.831066e-247 3.640486e-243 3045 URI1 -14.38462 5.569152 1125.222 1.088941e-246 3.640486e-243 12111 ZNF641 -14.53484 5.405809 1124.649 1.450661e-246 4.243546e-243 12221 SRP68 -14.21717 5.722162 1124.050 1.957926e-246 5.091043e-243 7402 UBAP2 -14.13349 5.793155 1123.103 3.145033e-246 7.360007e-243

but still no DE genes:

summary(decideTestsDGE(lrt.tgw)) TBHIV -1 23402 0 0 1 0

I had trimmed down my groups, so l am only comparing TB to TBHIV patients.

ADD REPLYlink written 2.0 years ago by mbanoian0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1142 users visited in the last hour