Question: Impute presence of HLA-B27 antigen in individual genome
gravatar for cslarsen
3.9 years ago by
cslarsen0 wrote:


I'm wondering if it's possible to impute the presence of the HLA-B27 antigen from a 23andMe genome? This is a genome of European origin.

I've tried using the snp2hla program (link below), but the results doesn't seem pretty good. And, I'm a total novice at this stuff anyway, so it's hard to me to see what's going on.

Here's what I did:

Converted 23andMe genome to plink format: plink2 --23file genome.txt familyid nameid M --out foo

Then I ran snp2hla: ./SNP2HLA.csh foo HM_CEU_REF foo2hla `which plink2` 2000 1000

This uses beagle and some awk scripts to produce a lot of files, including a Beagle gprobs file, a dosage file and a bgl.phased, among others. I haven't looked closely at the phased data, but I guess that's exactly what it is (I happen to know the correct phasing of the data, but I haven't spent time investigating).

The snp2hla program was originally made to be used with plink 1. Do you know if the file format has changed between plink 1 and 1.9/2?

Looking at the dosage file, it seems I get 0.000% presense hit on all HLA antigens, which I find very weird. But I'm definitely seeing some imputed SNPs that aren't part of the genome.

The snp2hla used to contain a large reference panel (T1DGC), but they've removed that from the net because of security (or privacy, I guess).

So my questions are: Is it at all possible to detect the presence of HLA-B27 from a 23andMe genome using a reference panel (I guess it should), and do you have any idea if I'm doing something wrong?

Any hints will be appreciated!

Link to snp2hla:


snp plink imputation 23andme genome • 3.2k views
ADD COMMENTlink modified 3.2 years ago by Open Genomes0 • written 3.9 years ago by cslarsen0

HIBAG is an HLA genotype imputation tool:, HIBAG can be used by researchers with published parameter estimates ( instead of requiring access to large training sample datasets.


ADD REPLYlink written 3.9 years ago by zhengxwen0
gravatar for Lemire
3.9 years ago by
Lemire410 wrote:

A while back I wrote a paper on the topic ( ), but I haven't touched on the subject since. But right now I can point to reference 8; in there you will find a Supplemental Table that includes a list of SNPs and alleles that can be used to predict the classical HLA alleles, as well as r2 values between SNP alleles and HLA alleles.  If you would have a reference panel, that would be best, but in absence of it what I just wrote could be a decent starting point for you. 

ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by Lemire410

The snp2hla program does come with a reference panel, but I'm still struggling to find _any_ HLA antigens in the output. I've had a look at the table you mentioned, but am currently scratching my head. I've found some relevant SNPs on SNPedia related to disease/condition phenotypes, but that's not really what I'm after. I just want to predict the presence or absence of the HLA-B*27 antigen (possibly HLA-B*5101 as well).

ADD REPLYlink written 3.9 years ago by cslarsen0

I ran the test example provided with the package. From 

% grep HLA_B_2705 1958BC_IMPUTED.bgl.phased  

M HLA_B_2705 A A A A A A A A A P A A A A A A A A A A


you see that the 5th individual has "genotype" A P (columns 11 and 12; two columns per patient for genotypes). The B*2705 allele is thus inferred to be present (P) in one copy.  

Now to get an idea of the uncertainty, you may want to do 

% grep HLA_B_2705 1958BC_IMPUTED.bgl.gprobs 

HLA_B_2705 P A 0.000 0.076 0.924 0.000 0.000 1.000 0.000 0.000 1.000 0.000 0.049 0.951 0.000 0.592 0.407 0.000 0.082 0.918 0.000 0.100 0.900 0.000 0.000 1.000 0.000 0.000 1.000 0.000 0.050 0.950


These values indicate the probabilities for the 3 possible genotypes (PP: presence of 2 copies; AP: presence of 1 copy; and AA:absence; respectively. There are 3 values per individual). The fifth individual has values in columns 16, 17 and 18 (0.000 0.592 0.407), which indicates that the AP genotype (presence of one copy of B*2705) has probability .592 and the AA genotype (absence of B*2705) has probability .407. So there's still a high likelihood of misclassification.  



ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by Lemire410

I think you're absolutely right. I've cross checked the same files, and they look correct here. I do get other combinations for other spots, so it looks alright. Thanks!


ADD REPLYlink written 3.9 years ago by cslarsen0
gravatar for Open Genomes
3.2 years ago by
Open Genomes Foundation
Open Genomes0 wrote:

Yes, there are 23andMe SNPs that can be used to impute HLA-B*27.

SNPedia HLA-B27

Hopefully this will be helpful to you and everyone else who thinks they are HLA-B27 +. Also, everyone can check HLA-B51 as well, because that is the other major potentially pathogenic HLA-B allele:

ADD COMMENTlink written 3.2 years ago by Open Genomes0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2514 users visited in the last hour