HI, I want to find differential expressed genes in TCGA datasets by edgeR. I used RNA-seq data measured by RPKM as input. The below code is basically followed edgeR manual section4.2 (https://bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf) combined with a post on Biostars (https://www.biostars.org/p/212200/).
#reading files file1="cancer.exprs.txt" file2="normal.exprs.txt" cancer.exprs = read.table(file = file1,header=T,sep="\t") normal.exprs = read.table(file = file2,header=T,sep="\t") #combine cancer and normal expression data and filter by median of expression value exprs=cbind(cancer.exprs,normal.exprs) rownames(exprs) = rownames(cancer.exprs) median = apply(exprs,1,median) exprs.filter = exprs[which(median >= 0.1),] #put the expression data into edgeR object y = DGEList(counts = exprs.filter, genes = rownames(exprs.filter)) #construct design model Tissue = factor(c(rep("T",dim(cancer.exprs)),rep("N",dim(normal.exprs)))) data.frame(Sample = colnames(y),Tissue) design = model.matrix(~0+Tissue) #compute differential expressed genes y = estimateDisp(y, design, robust=TRUE) y$common.dispersion fit = glmFit(y, design) lrt = glmLRT(fit) topTags(lrt,n=100)
cancer.exprs.txt and normal.exprs.txt are two files containing expression value measured by RPKM from cancer and normal samples. In these two files each column represents each sample and each row represents each gene. The first column and row are sample id and gene id. After running the code, I find that all genes are up-regulated genes with very huge logFC (I think it is log foldchange). Even I used cancer.exprs.txt as both cancer and normal input file, this code also consider all genes are up-regulated genes with very huge logFC. What is the wrong with me? Thanks