Question: edgeR RNAseq getting no differentially expressed genes when l compare my groups
0
gravatar for mbanoian
17 months ago by
mbanoian0
mbanoian0 wrote:

Hi All, l am at the beginning of my RNAseq journey and l have bumped into my first major snag.

I have adopted the methods that were presented in the publication : RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR [version 1; referees: 3 approved].

I have (3)control, (7)TBHIV, (4)TB,(5)HIV patient tissue.

I ran the script below and l keep on getting no differentially expressed genes when l perform a decideTests. :

RNAseq script

rawdata <- read.table("SteynRawCounts_Filtered_outliersremoved.txt", header=TRUE, sep="")
colnames(rawdata) <- c("Genes", rep("TBHIV",4),"HIV",rep("TB",2),"TBHIV",rep("HIV",2),"TBHIV","HIV","TB","HIV","TBHIV","TB",rep("Control",3))
library(ggplot2)
library(grid)
library(gridExtra)
library(edgeR)
y <- DGEList(counts=rawdata[,2:20], genes=rawdata[,1])
lcpm <- cpm(y, log=TRUE)
thresh <- cpm(y) > 0.5
keep <- rowSums(thresh) >= 4
y <- y[keep,]
plotMDS(y)

# recalculate library size and calculate normalization factors
y$samples$lib.size <- colSums(y$counts)
y <- calcNormFactors(y, method="TMM")
y <- estimateDisp(y, design, robust=TRUE)
plotMDS(y)

#Preparing for PCA and removing counts with zero variance
x <- y$counts
vars <- apply(y$counts, 1, var)
keep <- vars > 0
x <- x[keep,]
x <- cpm(x, log=T, prior.count=3)
samplenames <- as.factor(c(rep("TBHIV",4),"HIV",rep("TB",2),"TBHIV",rep("HIV",2),"TBHIV","HIV","TB","HIV","TBHIV","TB",rep("Control",3)))
colnames(x) <- samplenames

## pca
pca <- prcomp(t(x), scale.=TRUE)
plot(pca$x[,1], pca$x[,2], pch=19, col=samplenames)
#prepare the model design
diagnosis <- factor(c(rep("TBHIV",4),"HIV",rep("TB",2),"TBHIV",rep("HIV",2),"TBHIV","HIV","TB","HIV","TBHIV","TB",rep("Control",3)))
pathology <- factor(c("A","C","A","C","A","A","C","A","A","C","C","A","C","C","B","A","C","C","C"))
design <- model.matrix(~0+diagnosis)
colnames(design) <- gsub("diagnosis","",colnames(design))

contr.matrix <- makeContrasts(
  ControlvsHIV = HIV-Control,
  ControlvsTB = TB-Control,
  ControlvsTBHIV = TBHIV-Control,
  TBHIVvsHIV = HIV-TBHIV,
  TBHIVvsTB = TB-TBHIV,
  HIVvsTB = TB-HIV,
  levels = colnames(design))

v <- voom(y, design, plot=TRUE)
vfit <- lmFit(v,design)
vfit <- contrasts.fit(vfit, contrasts=contr.matrix)
efit <- eBayes(vfit)
plotSA(efit)
summary(decideTests(efit))

thank you for any pointers.

rna-seq R • 998 views
ADD COMMENTlink modified 17 months ago by Devon Ryan91k • written 17 months ago by mbanoian0

could you reformat that and keep to the relevant bits please; there seems to be multiple contrast matrices for example

ADD REPLYlink written 17 months ago by russhh4.6k

Hi russhh, I have re-formatted it just now. It's just a large block of code, as you can see. I don't have time to look over it in detail right now.

ADD REPLYlink written 17 months ago by Kevin Blighe46k

Could you confirm how and why you removed outliers and filtered the samples when you set the dataset up, please.

ADD REPLYlink written 17 months ago by russhh4.6k

Hi rush, l think the outliers were removed because we were not happy with the quality of the data. Sorry for the multiple contrasts, l will sort that out now.

ADD REPLYlink written 17 months ago by mbanoian0

Do you have cross-contrast significance for any of your features? That is, if you run topTable() on efit without specifying a particular contrast do you have any hits at adj.P < your threshold. To my mind, you've jumped into single-contrast tests too early. (aside: you might want to add library(limma) at the start of your script). If you get no hits, or very few hits, could you plot logFC versus AveCPM for each of your contrasts and see if there's any systematic biases that are wiping out any hint of differential expression.

ADD REPLYlink written 17 months ago by russhh4.6k

Hi rush (sorry my autocorrect keeps changing your name). Thank for you responses. I have been doing a bit of backtracking and l have noticed that the issue might be to do with the way l annotated the genes in DGEList.

The paper l am emulating used this code to annotate the genes: library(Mus.musculus) geneid <- rownames(x) genes <- select(Mus.musculus, keys=geneid, columns=c("SYMBOL", "TXCHROM"),head(genes),keytype="ENTREZID") x$genes <- genes

when l tried to do this it is using the initial gene lists (before removal of under expressed genes) i.e saying that genes has 56355 opposed to 23154 after processing. This is the code l used.

y1 <- y[keep,] genes <- rawdata_new[,1] y1$genes <- genes y1

ADD REPLYlink written 16 months ago by mbanoian0

Hi rush, l performed a fit using this code:

Read In Data

rawdata <- read.table("Trimmed Groups HIVTB comparison1.txt", header=TRUE, sep="") names(rawdata) group <- as.factor(c(rep("TBHIV",5),"TB","TB","TBHIV",rep("TB",4))) design <- model.matrix(~0+group) colnames(design) <- gsub("group", "", colnames(design)) design contr.matrix <- makeContrasts(TB.vs.TBHIV = TB-TBHIV, levels = colnames(design)) y <- DGEList(counts=rawdata[,2:13], genes=rawdata[,1],group=group) cpm <- cpm(y) lcpm <- cpm(y, log=TRUE) thresh <- cpm(y) > 0.5 keep <- rowSums(thresh) >= 2 dim(y) y <- y[keep,] dim(y) y <- calcNormFactors(y)

y <- estimateGLMCommonDisp(y,design) names(y) y$common.dispersion sqrt(y$common.dispersion) y <- estimateGLMTrendedDisp(y,design) summary(y$trended.dispersion) y <- estimateGLMTagwiseDisp(y,design) y$prior.df getPriorN(y,design) head(y$tagwise.dispersion) summary(y$tagwise.dispersion)

Generalized Linear Model Fit

glmfit.tgw <- glmFit(y,design,contrast=contr.matrix) lrt.tgw <- glmLRT(glmfit.tgw) topTags(lrt.tgw) summary(decideTestsDGE(lrt.tgw))

this was the out l got:

Coefficient: TBHIV genes logFC logCPM LR PValue FDR 53 KDM1A -14.38859 5.587484 1135.152 7.566662e-249 1.429321e-244 11952 DCTN5 -14.38519 5.657110 1134.195 1.221537e-248 1.429321e-244 7781 C1RL -13.99384 6.045328 1128.796 1.820943e-247 1.420457e-243 2485 CSNK2A1 -14.33314 5.615519 1127.431 3.604872e-247 2.109030e-243 5134 SPOP -13.88495 6.002822 1126.798 4.948748e-247 2.316212e-243 7181 EIF4E2 -14.45320 5.537715 1125.427 9.831066e-247 3.640486e-243 3045 URI1 -14.38462 5.569152 1125.222 1.088941e-246 3.640486e-243 12111 ZNF641 -14.53484 5.405809 1124.649 1.450661e-246 4.243546e-243 12221 SRP68 -14.21717 5.722162 1124.050 1.957926e-246 5.091043e-243 7402 UBAP2 -14.13349 5.793155 1123.103 3.145033e-246 7.360007e-243

but still no DE genes:

summary(decideTestsDGE(lrt.tgw)) TBHIV -1 23402 0 0 1 0

I had trimmed down my groups, so l am only comparing TB to TBHIV patients.

ADD REPLYlink written 16 months ago by mbanoian0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1324 users visited in the last hour