Question

Principal Component Analysis On Snp Markers. Scaled Or Unscaled?

0

Entering edit mode

10.6 years ago

714 ▴ 110

Hi guys,

I'm analysing a small subset of SNP markers found in a small number of genes. I am looking at whether there is some differentiation between two sub-populations using a PCA.

I have been reading around and from what I've inferred, scaling the variables in genetic analysis is not a good idea because the units are the same. However, I was wondering whether or not there are some circumstances in which scaling would be a good idea, in my case for example.

Thanks in advance!

pca • 3.8k views

ADD COMMENT • link updated 10.4 years ago by alex21 • 0 • written 10.6 years ago by 714 ▴ 110

score 0 · Answer 1 · 2013-09-19

0

Entering edit mode

10.6 years ago

Joseph Hughes ★ 3.0k

I can only think of good reasons not to scale.

ADD COMMENT • link 10.6 years ago by Joseph Hughes ★ 3.0k

score 0 · Answer 2 · 2013-11-15

Scaling is recommended when:

Variables have different units
Variables have values on different scales (e.g., variable "x" ranges from 1-100 and variable "y" ranges from 10,000-100,000)

In case #2, you lose information about the absolute value, but you gain the ability to compare two variables that normally wouldn't be usable (I've found large values dominate the first and second Principal Components). This situation is analogous to Spearsman's and Pearson's correlations: In Pearson's correlation, you use the absolute value to measure the correlation between two variables. But in Spearsman's correlation, you use the "rank" of a variable in the dataset to correlate it to the other variable. This concept of ranking is similar to the scaling done in PCA.