Dealing with Multiallelic in GWAS
2
1
Entering edit mode
3.8 years ago
godth13teen ▴ 70

Hi, I'm quite new to GWAS, based on my understanding so far, I have some questions.

Thank you for answering my question!

SNP GWAS • 2.3k views
ADD COMMENT
2
Entering edit mode
3.8 years ago

You include n-1 genotype columns in your regression, where n is the number of alleles. (One allele, usually the highest-frequency one, must be omitted to avoid linear dependence in the regression.)

ADD COMMENT
0
Entering edit mode

Hi, I'm not so clear about your answer, could you please explain a bit more? Thank you

ADD REPLY
1
Entering edit mode

Suppose you have 4 samples; let's label them A, B, C, and D. Sample A has genotype T/T at this SNP, and phenotype value 175. Sample B has genotype C/T and phenotype value 160; sample C has genotype C/C and phenotype value 155; and sample D has genotype T/T and phenotype value 173.

A standard GWAS is based on [phenotype] ~ [genotype, intercept, other predictors] regressions. Ignoring "other predictors" for now, the data matrices for the regression at this SNP would look like

phenotype        intercept  #C
      175                1   0
      160                1   1
      155                1   2
      173                1   0

I've labeled the single genotype column "#C" here, representing "number of copies of the C allele".

Now change sample D's genotype to A/T. This would leave the original data matrices unchanged: neither A/T nor T/T have any copies of C. Which may actually be fine for detecting whether the C allele has a noticeable effect, but we're now also interested in whether the A allele does. We investigate that by adding a #A column:

phenotype        intercept  #A  #C
      175                1   0   0
      160                1   0   1
      155                1   0   2
      173                1   1   0

Of course, with only 4 samples, we can't conclude much. But (with a good choice of "other predictors") this approach becomes quite effective as your sample size increases.

ADD REPLY
0
Entering edit mode

Ah, it's clear to me now, thank you

ADD REPLY
1
Entering edit mode
3.8 years ago
Asaf 10k
  1. The model is usually linear so 0,1,2 is the number of minor alleles in the genome (so 0=homo-major, 1=hetero, 2=homo-minor) and the assumption is that two minor alleles will have two times the effect of the major. It doesn't have to hold for every test and tool but this is what I've seen. If there are alternative minor alleles they could be two different SNPs or assumed to have the same effect (or avoided altogether).
  2. One way of dealing with epistasis could be to multiply the two SNPs values and divide by 2 (to be in the 0-2 range). I don't know a tool that can do this but statistically is should be valid (assuming linear interaction and additive effect).
ADD COMMENT
0
Entering edit mode

thank you for your comment

ADD REPLY

Login before adding your answer.

Traffic: 2426 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6