Question

After Getting Normalization Factor Via Edger, What To Do For Normalization?

2

Entering edit mode

10.5 years ago

Ngsnewbie ▴ 380

Dear All,

This is a pretty simple question, but i am getting confused ..

I have raw count data. I am using edgR (TMM),

I got normalization factor by a function calcNormFactors in edgeR package. I got final normalized values using cpm function also.

Now when i divide (also tried with multiplication) raw count with corresponding normalization factors of library, the value obtained is not same as it was obtained directly by cpm function.

In case of DESeq, it is pretty simple that follows division of raw count by lib size only.

What is the next calculation of normalization after getting scaling factors (here value of calcNormFactors )?

head(tab)
ID    S1    S2
CA_gi|502076645|ref|XM_004485358.1|    4    2
CA_gi|502076654|ref|XM_004485361.1|    0    8
CA_gi|502076657|ref|XM_004485362.1|    65    62
CA_gi|502076684|ref|XM_004485369.1|    0    2
CA_gi|502076687|ref|XM_004485370.1|    26    55
CA_gi|502076690|ref|XM_004485371.1|    119    252
CA_gi|502076693|ref|XM_004485372.1|    68    70
CA_gi|502076703|ref|XM_004485375.1|    12    20
CA_gi|502076706|ref|XM_004485376.1|    0    2


 edger<-calcNormFactors(tab)
 edger
[1] 1.0536160 0.9491124


 head(cpm(tab))
ID    S1    S2
CA_gi|502076645|ref|XM_004485358.1|   3.90172   1.786435
CA_gi|502076654|ref|XM_004485361.1|   0.00000   7.145741
CA_gi|502076657|ref|XM_004485362.1|  63.40294  55.379492
CA_gi|502076684|ref|XM_004485369.1|   0.00000   1.786435
CA_gi|502076687|ref|XM_004485370.1|  25.36118  49.126969
CA_gi|502076690|ref|XM_004485371.1| 116.07616 225.090840

For example , for the first gene in sample S1

4 / 1.0536160 = 3.79644956 (Not equal to 3.90172), & 4 * 1.0536160 = 4.214464 (Again not equal to 3.90172)

normalization edger • 14k views

ADD COMMENT • link updated 10.5 years ago by Damian Kao 16k • written 10.5 years ago by Ngsnewbie ▴ 380

score 9 · Answer 1 · 2013-10-21

9

Entering edit mode

10.5 years ago

Damian Kao 16k

The TMM counts are:

count / (library size * normalization factor)

Then multiply that by a million to get CPM.

Not

count / normalization factor

And DESeq doesn't just do a simple division by library size. It takes the median of the ratio of the count to the geometric mean of the expression values as the scaling factor for each library.

ADD COMMENT • link 10.5 years ago by Damian Kao 16k

0

Entering edit mode

Thanks Damian for the rectification :)

ADD REPLY • link 10.5 years ago by Ngsnewbie ▴ 380

0

Entering edit mode

HI Damian kao I am trying TMM normalization with my miRNA-seq data. I am new to R programming, So can you tell me, 1. How should my input data looks? (I have raw counts). 2. Can I get the R code for TMM normalization. Thanks in advance.

ADD REPLY • link 6.2 years ago by k.kathirvel93 ▴ 300

score 3 · Answer 2 · 2013-10-21

The entire point behind TMM normalization is to not use solely summed count numbers (e.g., cpm aka counts per million). So, It's unclear why you'd find it surprising that multiplying or dividing the raw count be the library size normalization factor won't produce the counts per million. BTW, this will also be the case for DESeq, where the same computation also won't be equivalent to cpm. The next step is to estimateCommonDisp(edger) and so on. See the edgeR vignette.