Question

EdgeR normalization doesn't change gene count number

0

Entering edit mode

3.4 years ago

miller93 • 0

Hi everyone,

I am new to RNA-seq and have I recently performed RNA-seq (Illumina NovaSeq 6000; 25M reads; PE, 100bp; Mus_musculus:mm10). I am currently performing a DE on my samples using EdgeR and noticed that my read counts do not change after normalization. Based on my understanding of how EdgeR normalization works, the read counts should change. Should I use a different normalization approach? Can someone help determine what I am doing incorrectly? Below I've included the R commands I've used to perform the analysis and the output.

COMMAND LINES IN R:

head(fc_res)

row.names(fc_res) <- fc_res$Gene

fc_res <- fc_res[-c(1,20)]

group <- factor(c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6))

y <- DGEList(counts=fc_res, group=group)

y$samples

keep <- filterByExpr(y, design)

filt1 <- y[keep,,keep.lib.sizes=FALSE]

dim(filt1)

ynorm <- calcNormFactors(filt1)

ynorm$samples

plotMDS(ynorm)

OUTPUT:

         group lib.size norm.factors
S1_REP1             1 21780263    0.9964533

S1_REP2             1 19985191    0.9930817

S1_REP3             1 22149855    0.9885835

S2_REP1                2 23139231    0.9778646

S2_REP2                2 23631094    0.9808015

S3_REP3                2 25284278    0.9769698

S3_REP1     3 20292937    0.9350821

S3_REP2     3 25694663    0.9335373

S3_REP3     3 20825373    0.9310621

S4_REP1        4 23995722    0.9514329

S4_REP2        4 23429701    0.9364913

S4_REP3        4 22528422    0.9643743

S5_REP1     5 27466471    1.1070687

S5_REP2     5 29807414    1.0291615

S5_REP3     5 20450417    1.0667385

S6_REP1        6 24783768    1.0259821

S6_REP2        6 21733797    1.1276330

S6_REP3        6 18539629    1.1109447

Thank you, David

RNA-Seq R next-gen • 704 views

ADD COMMENT • link updated 3.4 years ago by brianj.park ▴ 60 • written 3.4 years ago by miller93 • 0

score 0 · Answer 1 · 2020-11-12

You are not using discrete read counts for differential expression. There is a function called cpm that transforms your data into continuous log scale and you use these values for downstream differential expression. If you run lcpm <- cpm(y, log=TRUE) and lcpm <- cpm(ynorm, log=TRUE) you will see that the result reflects differences after your normalization.

Resource for edgeR here: https://www.bioconductor.org/packages/devel/workflows/vignettes/RNAseq123/inst/doc/limmaWorkflow.html, specifically section 5.1 for lcpm calculation and 5.3 for normalization.