Allele coding in BGENIE GWAS output
0
0
Entering edit mode
14 months ago
gokberk ▴ 70

Hi all, I have a quick question about BGENIE GWAS summary stats. In the summary statistics, alleles are coded as a_0 and a_1, looks like as the following:

chr rsid pos a_0 a_1 af info pheno1_beta pheno1_se pheno1_t ...
22 22:16050075:A:G 16050075 A G 0.0001 1 0.00067749 0.01008 0.067215 ...
22 22:16050115:G:A 16050115 G A 0.00545 1 -0.00022679 0.010577 -0.021441 ...
22 22:16050213:C:T 16050213 C T 0.00635 1 -0.0053945 0.010732 -0.50266 ...
22 22:16050319:C:T 16050319 C T 0.00115 1 -0.0072811 0.010548 -0.69025 ...
22 22:16050527:C:A 16050527 C A 0.00045 1 -0.010907 0.011428 -0.95444 ...
22 22:16050568:C:A 16050568 C A 0.00025 1 -0.0024885 0.011269 -0.22083 ...
22 22:16050607:G:A 16050607 G A 0.0006 1 0.013246 0.010527 1.2583 ...
22 22:16050627:G:T 16050627 G T 0.0004 1 -0.00043928 0.01008 -0.04358 ...
...


In their manual, they say the following about the allele coding:

In the regression model we code the first and second alleles as 0 and 1 respectively, so the beta coefficient refers to the effect of having an extra copy of the second allele.

So (just to be sure that there is not a random A1<->A2 swap in the summary stats format), I'd like to ask which allele (a_0 or a_1) is the reference (A1) and which one is the derived/effect (A2) allele in this context.

Cheers!

gwas bgenie summary statistics • 349 views