Question

how to normalize row counts before drawing heatmap?

1

Entering edit mode

4.9 years ago

smyiz ▴ 30

I have raw counts and edgeR differential expression results and want to draw a heatmap with logFC value. I have 12 groups, two cell lines with triplicate total and IP.

(cell1T1, cell1T2, cell1T3, cell1IP1, cell1IP2, cell1IP3, 
 cell2T1, cell2T2, cell2T3, cell2IP1, cell2IP2, cell2IP3)

I want to normalize the count data by calculating scaling factor, cpm and fold change (Ip/total). My R script:

 cs = colSums(count)
scale_factor <-  1e6 / colSums(count)
scale_factor
data = t( t(count)/cs) * 1e6                               
cs2 = colSums(data)
cs2

> cs = colSums(count)
cell1T1     cell1T2    cell1T3      cell1IP1     cell1IP2     cell1IP3 
9061105     6832076    1472003      12019856     5921757      2835648 
cell2T1     cell2T2    cell2T3      cell2IP1     cell2IP2     cell2IP3 
4696948     4387729    3907566      7580533      14312254     19052159

> scale_factor <-  1e6 / colSums(count)
> scale_factor

cell1T1     cell1T2    cell1T3      cell1IP1     cell1IP2     cell1IP3 
0.11036182  0.14636840 0.67934644   0.08319567   0.16886880   0.35265308 
cell2T1     cell2T2    cell2T3      cell2IP1     cell2IP2     cell2IP3
0.21290421  0.22790833 0.25591378   0.13191685   0.06987020   0.05248749

> data = t( t(count)/cs) * 1e6
> cs2 = colSums(data)
> cs2 

cell1T1     cell1T2    cell1T3      cell1IP1     cell1IP2     cell1IP3 
1e+06       1e+06      1e+06        1e+06        1e+06        1e+06 
cell2T1     cell2T2    cell2T3      cell2IP1     cell2IP2     cell2IP3
1e+06       1e+06      1e+06        1e+06        1e+06        1e+06

All columns sum to 1e6 (1 million). Does it show cpm value? After that how can I find fold changes between IP and totals?

R RNA-Seq heatmap • 3.3k views

ADD COMMENT • link 4.9 years ago by smyiz ▴ 30

2

Entering edit mode

Hi,

You can apply z-score standardization on edgeR normalized counts.

You may apply R script to transpose data and perform scale function to calculate z-score gene-wise, later re-transpose data as follows:

z_edgeRnormcounts = t(scale(t(edgeRnormcounts), center = TRUE, scale = TRUE))

These z-score you can use to plot heatmap for your gene of interest.

ADD REPLY • link 4.9 years ago by Ankit ▴ 500

score 1 · Answer 1 · 2019-06-07

Usually the packages used to analyse differential expression separate exploratory analyses (such as clustering, PCA, heatmaps, etc) from the actual differential expression testing.

edgeR provides the cpm( ) function, which produces moderated log2-counts-per-million from the raw counts. If you pass a DGEList object to cpm( ), it will use the normalized library sizes in the calculations, if you pass a matrix (and set cpm( count, log = FALSE ), then I think the result will be the same as yours above. You can probably use the cpmByGroup( ) function to calculate fold-changes, but this is not the preferred method.

In edgeR, the differential expression testing - including fold-change estimation - is performed on untransformed counts. There are several methodologies for DE modeling and testing in edgeR (such as glmQLFit() / glmQLFTest( ), glmFit( ) / glmLRT( ), and others), then one extracts the fold-changes from these results.