Question: Question about sva + edgeR to identify differentially expressed genes
0
gravatar for tujuchuanli
12 months ago by
tujuchuanli60
tujuchuanli60 wrote:

Hi,

I want to identify differential genes (DEG) in TCGA dataset (cancer samples vs normal samples) by combination of edgeR and sva (get rid of batch effect by sva). The following is my code and I am not sure if it is ok or not. Could you please help me to check the code and give me some suggestions?

Thanks.

I mainly referred to the following three posts and the manual of these two packages.

  1. https://support.bioconductor.org/p/76381/

  2. Batch effect in DESeq2 - multiple factor or SVA?

  3. sva + egdeR - differential expression analysis - RNA-seq data

Below is my code:

the first step is constructing file and filtering out lowly expressed genes. expres.filter is the matrix of reads counts for each gene in each sample. In this matrix, each row is a gene and each column is a sample.

y=DGEList(counts=exprs, genes=rownames(exprs))
keep <- rowSums(cpm(y)>0.1) >= 20
y <- y[keep, , keep.lib.sizes=FALSE]
y$samples$lib.size <- colSums(y$counts)
y=calcNormFactors(y)
y.cpm.values <- cpm(y$counts, log = F)

the second step is batch effect detection by sva, there are 120 cancer samples and 20 normal samples in my dataset.

Tissue <- factor(c(rep("T",120),rep("N",20))
mod <- model.matrix(~Tissue, data=Tissue)
mod0 <- model.matrix(~1, data=Tissue)
svseq = svaseq(y.cpm.values, mod, mod0)
design <- cbind(mod, svseq$sv)

the third step is identifing DEG by edgeR

y <- estimateDisp(y, design, robust=TRUE)
fit <- glmQLFit(y, design)
lrt <- glmTreat(fit,lfc=log2(1.5))

Finally, I get my DEG list.

edger sva • 613 views
ADD COMMENTlink modified 12 months ago by ATpoint32k • written 12 months ago by tujuchuanli60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1712 users visited in the last hour