Question: Question about sva + edgeR to identify differentially expressed genes
gravatar for tujuchuanli
22 months ago by
tujuchuanli90 wrote:


I want to identify differential genes (DEG) in TCGA dataset (cancer samples vs normal samples) by combination of edgeR and sva (get rid of batch effect by sva). The following is my code and I am not sure if it is ok or not. Could you please help me to check the code and give me some suggestions?


I mainly referred to the following three posts and the manual of these two packages.


  2. Batch effect in DESeq2 - multiple factor or SVA?

  3. sva + egdeR - differential expression analysis - RNA-seq data

Below is my code:

the first step is constructing file and filtering out lowly expressed genes. expres.filter is the matrix of reads counts for each gene in each sample. In this matrix, each row is a gene and each column is a sample.

y=DGEList(counts=exprs, genes=rownames(exprs))
keep <- rowSums(cpm(y)>0.1) >= 20
y <- y[keep, , keep.lib.sizes=FALSE]
y$samples$lib.size <- colSums(y$counts)
y.cpm.values <- cpm(y$counts, log = F)

the second step is batch effect detection by sva, there are 120 cancer samples and 20 normal samples in my dataset.

Tissue <- factor(c(rep("T",120),rep("N",20))
mod <- model.matrix(~Tissue, data=Tissue)
mod0 <- model.matrix(~1, data=Tissue)
svseq = svaseq(y.cpm.values, mod, mod0)
design <- cbind(mod, svseq$sv)

the third step is identifing DEG by edgeR

y <- estimateDisp(y, design, robust=TRUE)
fit <- glmQLFit(y, design)
lrt <- glmTreat(fit,lfc=log2(1.5))

Finally, I get my DEG list.

edger sva • 1.1k views
ADD COMMENTlink modified 22 months ago by ATpoint44k • written 22 months ago by tujuchuanli90
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1514 users visited in the last hour