correlation between data
1
0
Entering edit mode
2.5 years ago
star ▴ 280

I have some ChIP-seq data from different studies and I like to normalise them based on TMM and Upperquartile methods from edge R packages and then see which method is better for my data.

As you see their normalized data are different in the table for each method but when I got a correlation and draw heatmap plot, all the value is the same.

• I want to know finding correlation is a good way and why all value after cor() is the same?
• drawing heatmap on the result of the correlation is correct?

> dge <- DGEList(counts=data)

> data_upperquartile <- calcNormFactors(dge, method="upperquartile")

> data_upperquartile<- data.frame(cpm(data_upperquartile,normalized.lib.sizes = TRUE))

> data_upperquartile[c(100:105),c(1:3)]

A                                B                               C
0.1007585                        0.1230328                       0.01741683
0.1151526                        0.1730148                       0.03483366
0.1439407                        0.2268417                       0.04644487
0.1727289                        0.2768238                       0.05225048
0.1631328                        0.2460656                       0.04644487
0.1103546                        0.1461014                       0.02902805

>data_TMM <- calcNormFactors(dge, method="TMM")

>data_TMM<- data.frame(cpm(data_TMM,normalized.lib.sizes = TRUE))

> data_TMM[c(100:105),c(1:3)]

A                                 B                               C
0.09484844                        0.1153246                       0.01901974
0.10839821                        0.1621753                       0.03803947
0.13549776                        0.2126298                       0.05071930
0.16259732                        0.2594804                       0.05705921
0.15356413                        0.2306493                       0.05071930
0.10388162                        0.1369480                       0.03169956

> cor_data_upperquartile <- cor(data_upperquartile)

A              B                        C
A             1.0000000          0.9878731            0.9383675
B             0.9878731          1.0000000            0.9739410
C             0.9383675          0.9739410            1.0000000

>cor_data_TMM <- cor(data_TMM)

A              B                        C
A             1.0000000          0.9878731            0.9383675
B             0.9878731          1.0000000            0.9739410
C             0.9383675          0.9739410            1.0000000

R dataframe correlation ggplot edgeR • 610 views
0
Entering edit mode
2.5 years ago
Dinara • 0

Normalization doesn't change the correlation. It is just a mathematical fact, that cor( x , y )=cor( ax , by ), where a and b are positive scalar values.

1
Entering edit mode

As a remark, that is only true if normalization uses linear factors such as in TMM or the geometric mean approach of DESeq2. If you do something like quantile normalization or loess regression, cor will change dramatically.

0
Entering edit mode

Thanks for your reply. So how can I find which method is better?

1
Entering edit mode

I recommend reading the csaw manual on ChIP-seq normalization. It explains the concepts quite nicely and contains code to plot MA plots to visually check the normalization "efficiency".