Should centering and scaling be done on methylation beta values for principal component analysis (PCA)?
0
0
Entering edit mode
3 months ago
Pratik ★ 1.0k

Hello Biostars Community,

I am trying to see the similarity/differences of samples and if they cluster together into sample groups, but am unsure whether centering and scaling should both be done. I think that centering should be done, but scaling I am unsure about.

I would think that maybe scaling shouldn't be done since all the values are on the same beta value scale of 0 to 1 with most of the values being either 0 or 1 and some being in between 0 and 1:

The distribution looks something like the black All line below: beta value distribution

The PCA clustering does "look better" when both centering and scaling, meaning the samples group together more or less into their respective groups better than just simply centering (without scaling).

I am pretty sure centering should be done after watching some StatQuest videos on PCA. I am just unsure whether scaling should be done or not and if distribution of the data matters?

Image obtained from: https://zhou-lab.github.io/sesame/dev/supplemental.html#Quality_Control

Thank you in advance.

  • Pratik
PCA PCAtools • 485 views
ADD COMMENT
1
Entering edit mode

It's hard for me to imagine that scaling is important for methylation data. Scaling is simply there to control weights and it's hard to believe probes with less variance should have the same weights as those with larger variance.

That being said, is it possible this be an artifact from probe design? say the chip has less probes in regions with less variation, but more in regions with more. So those highly variable regions are over-represented already.

ADD REPLY
0
Entering edit mode

Thank you for taking the time to reply Zhenyu Zhang.

The beta values for the DMRs look more or less uniform across sample groups. That is, where there is a DMR, all samples within the sample group show the same/very similar methylation beta value. There are just a few DMRs that I saw where a few samples show different/variable methylation from the rest of the sample group.

What would you do:

to scale (the two groups on the left [red and green] are normal and tumor matched samples - the two groups on the right [blue and purple] are normal and tumor matched samples):

library(PCAtools)
p <- pca(mat = all_betas, metadata = metadata,center = T, scale = T, removeVar = 0.1)
biplot(p, colby = "Sample Group", lab = NA)

scaled

or not to scale:

p <- pca(mat = all_betas, metadata = metadata,center = T, scale = F, removeVar = 0.1)
biplot(p, colby = "Sample Group", lab = NA)

centered but not scaled

ADD REPLY

Login before adding your answer.

Traffic: 2420 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6