How to calculate SNP effect size using Beta and Risk Allele Frequency?
1
1
Entering edit mode
23 months ago
BobN ▴ 10

Hi,

I’m not a scientist but I really want to understand how calculation of SNP effect size work.

I’ve seen visualisations of effect size on a various websites that provide analyzing of raw dna files from direct-to-customer dna testing companies and want to replicate it by my own.

Here is example from dna.land:

They describe that chart like that:

For each SNP, this effect is determined by how many effect alleles you possess (0, 1, or 2), the effect size of the SNP, and the frequency of the effect allele in your ancestry group. Each bar on the y-axis represents one SNP. The x-axis represents the effect of the SNP and can be interpreted as a distance in standard deviations from the mean trait score on a standard normal distribution.

I’ve tried to make the same chart by myself but looks like i am missing some knowledge about how to do it.

I picked the same SNPs as in example, found info about them in GWAS Catalog and added my genotype from raw dna file.

I put it in a table:

SNP My Genotype Effect Allele   P-value RAF Beta    CI
rs490647    AG  A   3 x 10-7    0.24648 0.029943 unit increase  [0.018-0.041]
rs4653663   AT  A   2 x 10-8    0.255   0.091 unit increase [0.06-0.122]
rs12637928  AT  A   4 x 10-8    0.49    0.077 unit decrease [0.05-0.104]
rs12682352  TT  T   2 x 10-15   0.525   0.115 unit increase [0.088-0.142]
rs12378446  CT  T   9 x 10-9    0.791   0.1 unit increase   [0.067-0.133]


At first i thought that for this visualisation I need only Beta coefficient and amount of risk alleles in my genotype, but looks like i need also take into account Risk Allele Frequency and ancestry group.

Can someone help me understand:

1) How to calculate effect size using Beta and Risk Allele Frequency

2) How to take into account my Ancestry group?

Edit:

I have one more question regarding image with effect sizes in first post/

How is it possible that I have both risk alleles but still don't have 100% of potential effect?

For example on rs12682352 my genotype is TT with risk allele T, but on image red part is going only on half to the right. How is it possible?

I thought you getting maximum effect size when you have both risk alleles but looks like it's not maximum.

SNP effect size • 2.4k views
1
Entering edit mode

Regarding your new question, I think blue bars show effect for any/all population, hence, you being in certain X ancestry and having both risk alleles doesn't give you full risk.

It is best to contact the providers and ask for clarification, instead of getting a guesswork from the webs.

1
Entering edit mode

How is it possible that I have both risk alleles but still don't have 100% of potential effect?

• Incomplete penetrance of these alleles in your population?
• Unexplained environmental effects?
2
Entering edit mode
23 months ago

Well, it is no surprise that the methods written by fellow scientists are incoherent and difficult to follow <- this is why we should just be putting commented code into manuscript supplementary material and not solely textual descriptions of the code.

Basically, reading that, I have no idea what they did.

Just some general points:

• The exponent of the beta coefficient will give you the odds ratio
• to account for risk allele frequency in your own cohort, i.e., penetrance, there are likely different ways to account for this via a regression model
• Ancestry (population stratification) is usually accounted for by including (as covariates in the design formula) the loadings for PC1 and PC2, or whatever number of PCs are segregating your populations

The x-axis represents the effect of the SNP and can be interpreted as a distance in standard deviations from the mean trait score on a standard normal distribution.

They could have just stated that the values are Z-scores. The "standard normal distribution" is the Z distribution, which measures standard deviations from the mean. I posted a previous answer on this: A: SNP dataset and Z Score

0
Entering edit mode

Thank you for help, looks like they really used Z-scores.

But now I have one more question - how is it possible, that I have both risk alleles but still don't have 100% of potential effect? I have this situation with rs12682352 on example above.