Question: EdgeR normalization doesn't change gene count number
0
gravatar for miller93
9 weeks ago by
miller930
miller930 wrote:

Hi everyone,

I am new to RNA-seq and have I recently performed RNA-seq (Illumina NovaSeq 6000; 25M reads; PE, 100bp; Mus_musculus:mm10). I am currently performing a DE on my samples using EdgeR and noticed that my read counts do not change after normalization. Based on my understanding of how EdgeR normalization works, the read counts should change. Should I use a different normalization approach? Can someone help determine what I am doing incorrectly? Below I've included the R commands I've used to perform the analysis and the output.

COMMAND LINES IN R:

head(fc_res)

row.names(fc_res) <- fc_res$Gene

fc_res <- fc_res[-c(1,20)]

group <- factor(c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6))

y <- DGEList(counts=fc_res, group=group)

y$samples

keep <- filterByExpr(y, design)

filt1 <- y[keep,,keep.lib.sizes=FALSE]

dim(filt1)

ynorm <- calcNormFactors(filt1)

ynorm$samples

plotMDS(ynorm)

OUTPUT:

         group lib.size norm.factors
S1_REP1             1 21780263    0.9964533

S1_REP2             1 19985191    0.9930817

S1_REP3             1 22149855    0.9885835

S2_REP1                2 23139231    0.9778646

S2_REP2                2 23631094    0.9808015

S3_REP3                2 25284278    0.9769698

S3_REP1     3 20292937    0.9350821

S3_REP2     3 25694663    0.9335373

S3_REP3     3 20825373    0.9310621

S4_REP1        4 23995722    0.9514329

S4_REP2        4 23429701    0.9364913

S4_REP3        4 22528422    0.9643743

S5_REP1     5 27466471    1.1070687

S5_REP2     5 29807414    1.0291615

S5_REP3     5 20450417    1.0667385

S6_REP1        6 24783768    1.0259821

S6_REP2        6 21733797    1.1276330

S6_REP3        6 18539629    1.1109447

Thank you, David

rna-seq next-gen R • 138 views
ADD COMMENTlink modified 9 weeks ago by brianj.park50 • written 9 weeks ago by miller930
0
gravatar for brianj.park
9 weeks ago by
brianj.park50
Montréal, Canada
brianj.park50 wrote:

You are not using discrete read counts for differential expression. There is a function called cpm that transforms your data into continuous log scale and you use these values for downstream differential expression. If you run lcpm <- cpm(y, log=TRUE) and lcpm <- cpm(ynorm, log=TRUE) you will see that the result reflects differences after your normalization.

Resource for edgeR here: https://www.bioconductor.org/packages/devel/workflows/vignettes/RNAseq123/inst/doc/limmaWorkflow.html, specifically section 5.1 for lcpm calculation and 5.3 for normalization.

ADD COMMENTlink modified 9 weeks ago • written 9 weeks ago by brianj.park50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2398 users visited in the last hour