Question: Differential expression of RNA seq data using EdgeR
2
gravatar for GuriGuri1565
2.5 years ago by
GuriGuri156520
GuriGuri156520 wrote:

Hi everyone,

I'm using edgeR for differential expression genes analysis. I make 4 groups that get differential expression genes each and then compare the number of genes. But there are no common genes between 4 groups. the result is not ideal. Please, Could you help me to check on the R codes?

I have 4 groups(18H, 18L, 30H, 30L).

each group consists of 10 samples(18H_1,..., 18H_10, 18L_1, ..., 18L_10, 30H_1,..., 30H_10, 30L_1, ..., 30L_10).

I made 4 groups to get genes that differentially express between two groups(18H18L, 18H30H, 18L30L, 30H30L).

and I made venn diagram to compare the number of genes between 4 groups.

I made several scripts. But there are no common genes at all scripts.

#1
genecounts <- read.table("18H18L.txt", row.names = 1)

colnames(genecounts) <- c(paste("18H",1:10,sep = "_"), paste("18L",1:10,sep="_"))

condition <-c(rep("H", 10), rep("L", 10))

dge <- DGEList(counts = genecounts, group = condition)

countsPerMillion <- cpm(dge)

countCheck <- countsPerMillion > 1

keep <- which(rowSums(countCheck)>=2)

dge <- dge[keep,]

dge <- calcNormFactors(dge)

design <- model.matrix(~ condition + 0, data = dge$samples)

colnames(design) = c("High", "Low")

disp <- estimateGLMCommonDisp(dge, design)

disp <- estimateGLMTrendedDisp(disp, design)

disp <- estimateGLMTagwiseDisp(disp, design)

fit <- glmFit(disp, design)

lrt <- glmLRT(fit)

res <- as.data.frame(lrt$table)

res <- cbind(res, FDR = p.adjust(res$PValue, method = "BH"))

fc1 <- res[res[,1] >= 1 | res[,1] <= -1,]

fc1p <- fc1[fc1$PValue <= 0.05,]

fc1q <- fc1[fc1$FDR <= 0.05,]


#2 
genecounts <- read.table("18H18L.txt", row.names = 1)

colnames(genecounts) <- c(paste("18H",1:10,sep = "_"),paste("18L",1:10,sep="_"))

condition <-c(rep("H", 10), rep("L", 10))

dge <- DGEList(counts = genecounts, group = condition)

countsPerMillion <- cpm(dge)

countCheck <- countsPerMillion > 0.5

keep <- which(rowSums(countCheck)>=2)

dge <- dge[keep,]

dge <- calcNormFactors(dge)

design <- model.matrix(~ condition + 0, data = dge$samples)

colnames(design) = c("High", "Low")

disp <- estimateGLMCommonDisp(dge, design)

disp <- estimateGLMTrendedDisp(disp, design)

disp <- estimateGLMTagwiseDisp(disp, design)

fit <- glmFit(disp, design)

highlow <- makeContrasts(High-Low,levels = design)

lrt <- glmLRT(fit, contrast =highlow)

res <- as.data.frame(lrt$table)

res <- cbind(res, FDR = p.adjust(res$PValue, method = "BH"))

fc1 <- res[res[,1] >= 1 | res[,1] <= -1,]

fc1p <- fc1[fc1$PValue <= 0.05,]

fc1q <- fc1[fc1$FDR <= 0.05,]

write.table(fc1p,"fc1p_18H18L_1.txt", quote = F, sep = "\t", row.names = T, col.names = T)



#3
genecounts <- read.table("18H18L.txt", row.names = 1)

colnames(genecounts) <- c(paste("18H",1:10,sep = "_"),paste("18L",1:10,sep="_"))

condition <-c(rep("H", 10), rep("L", 10))

dge <- DGEList(counts = genecounts, group = condition)

countsPerMillion <- cpm(dge)

countCheck <- countsPerMillion > 0.5

keep <- which(rowSums(countCheck)>=2)

dge <- dge[keep,]

dge <- calcNormFactors(dge)

design <- model.matrix(~ condition + 0, data = dge$samples)

colnames(design) = c("High", "Low")

disp <- estimateGLMCommonDisp(dge, design)

disp <- estimateGLMTrendedDisp(disp, design)

disp <- estimateGLMTagwiseDisp(disp, design)

fit <- glmFit(disp, design)

lrt <- glmLRT(fit)

res <- as.data.frame(lrt$table)

res <- cbind(res, FDR = p.adjust(res$PValue, method = "BH"))

fc1 <- res[res[,1] >= 1 | res[,1] <= -1,]

fc1p <- fc1[fc1$PValue <= 0.05,]

fc1q <- fc1[fc1$FDR <= 0.05,]

#2-1
library(gplots)

data1 <- read.table("fc1p_18H18L_1.txt", header = T)

data2 <- read.table("fc1p_18H30H_1.txt", header = T)

data3 <- read.table("fc1p_18L30L_1.txt", header = T)

data4 <- read.table("fc1p_30H30L_1.txt", header = T)

test1 <- data1[,1]

test2 <- data2[,1]

test3 <- data3[,1]

test4 <- data4[,1]

venn(list(A = test1, B = test2, C = test3, D = test4))

A group = 105, B group = 1778, C group = 1665, D group = 47

There are no common genes between groups.

If there is any mistake or question, please let me know. I look forward to hearing from you.

Thanks!

limma edger rna-seq R gene • 2.0k views
ADD COMMENTlink modified 2.5 years ago by genomax87k • written 2.5 years ago by GuriGuri156520
2

Do all your groups belong to the same experiment? Why are you fitting several models with subsets of the data? Fit one model with all data and use makeContrasts to perform differential analyses for the contrasts of interest.

condition <-factor( c( rep("H18", 10), 
                       rep("L18", 10), 
                       rep("H30", 10), 
                       rep("L30", 10) ) )

design <- model.matrix( ~ 0 + condition )

colnames(design) <- levels( condition )

contrasts <- makeContrasts( H18.vs.L18 = H18 - L18,
                            H30.vs.L30 = H30 - L30,
                            H18.vs.H30 = H18 - H30,
                            L18.vs.L30 = L18 - L30,
                            levels = design )

Then, to test for differences in H18 vs L18:

lrt <- glmLRT(fit, contrast = contrasts[,1] )

In addition, from what I understood from 2-1, sections 1, 2 and 3 are wrong or incomplete: sections 2 and 3 are (as far as I can tell) identical, and there should be a section 4.

ADD REPLYlink written 2.5 years ago by h.mon30k

I’m so sorry for late response. Thank you~

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by GuriGuri156520
1

isn't it possible that there simply aren't any overlapping genes?

ADD REPLYlink written 2.5 years ago by russhh5.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 909 users visited in the last hour