Allele coding in BGENIE GWAS output
Entering edit mode
23 months ago
gokberk ▴ 70

Hi all, I have a quick question about BGENIE GWAS summary stats. In the summary statistics, alleles are coded as a_0 and a_1, looks like as the following:

chr rsid pos a_0 a_1 af info pheno1_beta pheno1_se pheno1_t ...
22 22:16050075:A:G 16050075 A G 0.0001 1 0.00067749 0.01008 0.067215 ...
22 22:16050115:G:A 16050115 G A 0.00545 1 -0.00022679 0.010577 -0.021441 ...
22 22:16050213:C:T 16050213 C T 0.00635 1 -0.0053945 0.010732 -0.50266 ...
22 22:16050319:C:T 16050319 C T 0.00115 1 -0.0072811 0.010548 -0.69025 ...
22 22:16050527:C:A 16050527 C A 0.00045 1 -0.010907 0.011428 -0.95444 ...
22 22:16050568:C:A 16050568 C A 0.00025 1 -0.0024885 0.011269 -0.22083 ...
22 22:16050607:G:A 16050607 G A 0.0006 1 0.013246 0.010527 1.2583 ...
22 22:16050627:G:T 16050627 G T 0.0004 1 -0.00043928 0.01008 -0.04358 ...

In their manual, they say the following about the allele coding:

In the regression model we code the first and second alleles as 0 and 1 respectively, so the beta coefficient refers to the effect of having an extra copy of the second allele.

So (just to be sure that there is not a random A1<->A2 swap in the summary stats format), I'd like to ask which allele (a_0 or a_1) is the reference (A1) and which one is the derived/effect (A2) allele in this context.


gwas bgenie summary statistics • 757 views
Entering edit mode
8 months ago
Al Murphy ▴ 30

I am not too familiar with BGENIE's summary statistics formats but from the citation you took from their manual, it is telling you that the second allele listed (a_1) is the effect allele (A2). However if you want to know more about reference and alternative alleles and want a tool to standardise any GWAS sumstats file as well as testing there is no incorrect direction (a_0 and a_1 value flipped) along with a whole host of other checks, check out MungeSumstats R Bioconductor package (I wrote this package). You can pass the GWAS sumstats to MungeSumstats with 1 call and can correct any flipped alleles, ensure all alleles are on the reference genome, infer and liftover the reference genome between hg19 and hg38 etc...


Login before adding your answer.

Traffic: 1322 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6