EdgeR multiple samples
0
0
Entering edit mode
3 months ago
Pahi • 0

Hi!

Recently I'm working on an R pipeline for RNA-seq analysis. My goal is to create a script that compares my samples to each other and compute the DE.

So far I created my script and its working well. But when I implemented loops which would make my life easier, I noticed that the result are different. To be precise: First I did the comparisons sample pair by pair, than with the loops I loaded in all my data and made the comparison sample pair by pair.

What I noticed is everything goes as it should be, than when I reach the part where I call the

fit <- glmQLFit(dge,dge$design,robust = TRUE) function, the a little difference shows up: The first pic shows when ALL the samples loaded in, the second is when only 2 group (6 samples) is loaded in. After that when I want to make a res <- glmQLFTest(fit,contrast = c(-1, 0,1,0)) it obviously shows different result. Edger uses different calculations based on the loaded samples? Also it makes the DE totally different when I try to visualize it. For the 2 example: One plot shows a lot of DE genes (this is where only 2 groups (6 samples) were compared), and the other only shows 2 (this is where all the samples processed together). My code looks the same even if I use 2 group samples or multiple group samples. What could be the problem? Thank you for your time and help! EdgeR R • 271 views ADD COMMENT 2 Entering edit mode If I understand correctly, your question is basically that you get a slightly different result if you calculate DE using all your data versus subsets of your data. The reason the results will not be identical is because the variance estimate for a gene will be much more informed in a larger data set, across more conditions or samples, than in a smaller data set. It's not so much a problem as a feature. Better estimates of variance allow better evaluation of significance. ADD REPLY 0 Entering edit mode My code : group <- target$group

dge <- DGEList(counts=reads, genes = gene_name, group = group)
dge$design <- model.matrix(~0+group, data=dge$samples)

dge$genes$ENTREZID <- mapIds(org.Dm.eg.db, dge$genes$genes, keytype = "ENSEMBL",column="ENTREZID")
dge$genes$Symbol <- mapIds(org.Dm.eg.db, dge$genes$genes, keytype = "ENSEMBL", column="SYMBOL")

dge <- dge[!is.na(dge$genes$ENTREZID),]
dge <- dge[!is.na(dge$genes$Symbol),]

keep_dge <- filterByExpr(dge,dge$design) table(keep_dge) dge <- dge[keep_dge,,keep.lib.sizes=FALSE] dge <- calcNormFactors(dge) dge <- estimateDisp(dge) dge_df <- as.data.frame(dge) fit2 <- glmQLFit(dge,dge$design,robust = TRUE)
head(fit$coefficients) colsToTest <- grep(varInt,colnames(fit2$design))
namesToTest <- paste(gsub(varInt,"",colnames(fit2$design)[colsToTest]),"_vs_",condRef) results <- list() lrt_list <- list() if (length(colsToTest)>=2){ colnames <- gsub(varInt,"",colnames(fit2$design))   for (comp in combn(length(colsToTest),2,simplify=FALSE)){
contrast <- numeric(ncol(dge$design)) contrast[colsToTest[comp[1:2]]] <- c(-1,1) namecomp <- paste0(colnames[colsToTest[comp[2]]],"_vs_",colnames[colsToTest[comp[1]]]) cat(paste0("Comparison ",gsub("_"," ",namecomp),": testing contrast (",paste(contrast,collapse=", "),")"),"\n") lrt <- glmQLFTest(fit2, contrast=contrast) results[[namecomp]] <- topTags(lrt,n=nrow(dge$counts))\$table
lrt_list[[namecomp]] <- lrt      } }