Question: Principal Component Analysis On Snp Markers. Scaled Or Unscaled?
gravatar for 714
6.6 years ago by
71490 wrote:

Hi guys,

I'm analysing a small subset of SNP markers found in a small number of genes. I am looking at whether there is some differentiation between two sub-populations using a PCA.

I have been reading around and from what I've inferred, scaling the variables in genetic analysis is not a good idea because the units are the same. However, I was wondering whether or not there are some circumstances in which scaling would be a good idea, in my case for example.

Thanks in advance!

pca • 2.7k views
ADD COMMENTlink modified 6.4 years ago by alex210 • written 6.6 years ago by 71490
gravatar for Joseph Hughes
6.6 years ago by
Joseph Hughes2.8k
Scotland, UK
Joseph Hughes2.8k wrote:

I can only think of good reasons not to scale.

ADD COMMENTlink written 6.6 years ago by Joseph Hughes2.8k
gravatar for alex21
6.4 years ago by
alex210 wrote:

Scaling is recommended when:

  1. Variables have different units
  2. Variables have values on different scales (e.g., variable "x" ranges from 1-100 and variable "y" ranges from 10,000-100,000)

In case #2, you lose information about the absolute value, but you gain the ability to compare two variables that normally wouldn't be usable (I've found large values dominate the first and second Principal Components). This situation is analogous to Spearsman's and Pearson's correlations: In Pearson's correlation, you use the absolute value to measure the correlation between two variables. But in Spearsman's correlation, you use the "rank" of a variable in the dataset to correlate it to the other variable. This concept of ranking is similar to the scaling done in PCA.

ADD COMMENTlink modified 6.4 years ago • written 6.4 years ago by alex210
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1737 users visited in the last hour