Question

Understanding standardized genotype matrix

0

Entering edit mode

7.1 years ago

GabrielMontenegro ▴ 670

This might be really basic, but I am trying to understand why a standardized genotype matrix is constructed this way.

In a standardized genotype matrix we have each ij element:

w ij = (x ij - 2 p i ) / SQROOT( 2p i (1 - p i )

Where x ij is the number of copies of the reference allele for the i SNP of the j individual and p i is the frequency of the reference allele.

I understand that the number of reference allele that any individual can have is influenced by the allele frequency of that SNP, so standardizing it by the allele frequency seems a good idea. But why is it standardized in this way? Is it a common standardization? Does it have a name?

SNP genome • 4.1k views

ADD COMMENT • link updated 7.1 years ago by Ahill ★ 1.9k • written 7.1 years ago by GabrielMontenegro ▴ 670

score 5 · Accepted Answer · 2017-03-08

5

Entering edit mode

7.1 years ago

Ahill ★ 1.9k

Not a geneticist, but using your notation, for autosomes 2p is the expected (mean) value of x and sqroot(2p(1-p)) will be the standard deviation of x. So this is a z-score. The standardized values (wij) will have mean zero and variance 1 for each SNP, I guess a nice property for some calculations.

ADD COMMENT • link 7.1 years ago by Ahill ★ 1.9k