Question: Question about sva + edgeR to identify differentially expressed genes
0
gravatar for tujuchuanli
27 days ago by
tujuchuanli40
tujuchuanli40 wrote:

Hi,

I want to identify differential genes (DEG) in TCGA dataset (cancer samples vs normal samples) by combination of edgeR and sva (get rid of batch effect by sva). The following is my code and I am not sure if it is ok or not. Could you please help me to check the code and give me some suggestions?

Thanks.

I mainly referred to the following three posts and the manual of these two packages.

  1. https://support.bioconductor.org/p/76381/

  2. Batch effect in DESeq2 - multiple factor or SVA?

  3. sva + egdeR - differential expression analysis - RNA-seq data

Below is my code:

the first step is constructing file and filtering out lowly expressed genes. expres.filter is the matrix of reads counts for each gene in each sample. In this matrix, each row is a gene and each column is a sample.

y=DGEList(counts=exprs, genes=rownames(exprs))
keep <- rowSums(cpm(y)>0.1) >= 20
y <- y[keep, , keep.lib.sizes=FALSE]
y$samples$lib.size <- colSums(y$counts)
y=calcNormFactors(y)
y.cpm.values <- cpm(y$counts, log = F)

the second step is batch effect detection by sva, there are 120 cancer samples and 20 normal samples in my dataset.

Tissue <- factor(c(rep("T",120),rep("N",20))
mod <- model.matrix(~Tissue, data=Tissue)
mod0 <- model.matrix(~1, data=Tissue)
svseq = svaseq(y.cpm.values, mod, mod0)
design <- cbind(mod, svseq$sv)

the third step is identifing DEG by edgeR

y <- estimateDisp(y, design, robust=TRUE)
fit <- glmQLFit(y, design)
lrt <- glmTreat(fit,lfc=log2(1.5))

Finally, I get my DEG list.

edger sva • 97 views
ADD COMMENTlink modified 27 days ago by ATpoint15k • written 27 days ago by tujuchuanli40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 873 users visited in the last hour