Question: After Getting Normalization Factor Via Edger, What To Do For Normalization?
2
gravatar for Ngsnewbie
5.8 years ago by
Ngsnewbie360
Ngsnewbie360 wrote:

Dear All,

This is a pretty simple question, but i am getting confused ..

I have raw count data. I am using edgR (TMM),

I got normalization factor by a function calcNormFactors in edgeR package. I got final normalized values using cpm function also.

Now when i divide (also tried with multiplication) raw count with corresponding normalization factors of library, the value obtained is not same as it was obtained directly by cpm function.

In case of DESeq, it is pretty simple that follows division of raw count by lib size only.

What is the next calculation of normalization after getting scaling factors (here value of calcNormFactors )?

head(tab)
ID    S1    S2
CA_gi|502076645|ref|XM_004485358.1|    4    2
CA_gi|502076654|ref|XM_004485361.1|    0    8
CA_gi|502076657|ref|XM_004485362.1|    65    62
CA_gi|502076684|ref|XM_004485369.1|    0    2
CA_gi|502076687|ref|XM_004485370.1|    26    55
CA_gi|502076690|ref|XM_004485371.1|    119    252
CA_gi|502076693|ref|XM_004485372.1|    68    70
CA_gi|502076703|ref|XM_004485375.1|    12    20
CA_gi|502076706|ref|XM_004485376.1|    0    2


 edger<-calcNormFactors(tab)
 edger
[1] 1.0536160 0.9491124


 head(cpm(tab))
ID    S1    S2
CA_gi|502076645|ref|XM_004485358.1|   3.90172   1.786435
CA_gi|502076654|ref|XM_004485361.1|   0.00000   7.145741
CA_gi|502076657|ref|XM_004485362.1|  63.40294  55.379492
CA_gi|502076684|ref|XM_004485369.1|   0.00000   1.786435
CA_gi|502076687|ref|XM_004485370.1|  25.36118  49.126969
CA_gi|502076690|ref|XM_004485371.1| 116.07616 225.090840

For example , for the first gene in sample S1

4 / 1.0536160 = 3.79644956 (Not equal to 3.90172), & 4 * 1.0536160 = 4.214464 (Again not equal to 3.90172)

normalization edger • 9.7k views
ADD COMMENTlink modified 5.8 years ago by Damian Kao15k • written 5.8 years ago by Ngsnewbie360
9
gravatar for Damian Kao
5.8 years ago by
Damian Kao15k
USA
Damian Kao15k wrote:

The TMM counts are:

count / (library size * normalization factor)

Then multiply that by a million to get CPM.

Not

count / normalization factor

And DESeq doesn't just do a simple division by library size. It takes the median of the ratio of the count to the geometric mean of the expression values as the scaling factor for each library.

ADD COMMENTlink modified 5.8 years ago • written 5.8 years ago by Damian Kao15k

Thanks Damian for the rectification :)

ADD REPLYlink written 5.8 years ago by Ngsnewbie360

HI Damian kao I am trying TMM normalization with my miRNA-seq data. I am new to R programming, So can you tell me, 1. How should my input data looks? (I have raw counts). 2. Can I get the R code for TMM normalization. Thanks in advance.

ADD REPLYlink written 18 months ago by k.kathirvel93200
3
gravatar for Devon Ryan
5.8 years ago by
Devon Ryan91k
Freiburg, Germany
Devon Ryan91k wrote:

The entire point behind TMM normalization is to not use solely summed count numbers (e.g., cpm aka counts per million). So, It's unclear why you'd find it surprising that multiplying or dividing the raw count be the library size normalization factor won't produce the counts per million. BTW, this will also be the case for DESeq, where the same computation also won't be equivalent to cpm. The next step is to estimateCommonDisp(edger) and so on. See the edgeR vignette.

ADD COMMENTlink written 5.8 years ago by Devon Ryan91k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1702 users visited in the last hour