Principal Component Analysis On Snp Markers. Scaled Or Unscaled?
2
0
Entering edit mode
10.6 years ago
714 ▴ 110

Hi guys,

I'm analysing a small subset of SNP markers found in a small number of genes. I am looking at whether there is some differentiation between two sub-populations using a PCA.

I have been reading around and from what I've inferred, scaling the variables in genetic analysis is not a good idea because the units are the same. However, I was wondering whether or not there are some circumstances in which scaling would be a good idea, in my case for example.

Thanks in advance!

pca • 3.8k views
ADD COMMENT
0
Entering edit mode
10.6 years ago
Joseph Hughes ★ 3.0k

I can only think of good reasons not to scale.

ADD COMMENT
0
Entering edit mode
10.4 years ago
alex21 • 0

Scaling is recommended when:

  1. Variables have different units
  2. Variables have values on different scales (e.g., variable "x" ranges from 1-100 and variable "y" ranges from 10,000-100,000)

In case #2, you lose information about the absolute value, but you gain the ability to compare two variables that normally wouldn't be usable (I've found large values dominate the first and second Principal Components). This situation is analogous to Spearsman's and Pearson's correlations: In Pearson's correlation, you use the absolute value to measure the correlation between two variables. But in Spearsman's correlation, you use the "rank" of a variable in the dataset to correlate it to the other variable. This concept of ranking is similar to the scaling done in PCA.

ADD COMMENT

Login before adding your answer.

Traffic: 2955 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6