Question

Impute presence of HLA-B27 antigen in individual genome

0

Entering edit mode

8.5 years ago

cslarsen • 0

Hi,

I'm wondering if it's possible to impute the presence of the HLA-B27 antigen from a 23andMe genome? This is a genome of European origin.

I've tried using the snp2hla program (link below), but the results doesn't seem pretty good. And, I'm a total novice at this stuff anyway, so it's hard to me to see what's going on.

Here's what I did:

Converted 23andMe genome to plink format:

plink2 --23file genome.txt familyid nameid M --out foo

Then I ran snp2hla:

./SNP2HLA.csh foo HM_CEU_REF foo2hla `which plink2` 2000 1000

This uses beagle and some awk scripts to produce a lot of files, including a Beagle gprobs file, a dosage file and a bgl.phased, among others. I haven't looked closely at the phased data, but I guess that's exactly what it is (I happen to know the correct phasing of the data, but I haven't spent time investigating).

The snp2hla program was originally made to be used with plink 1. Do you know if the file format has changed between plink 1 and 1.9/2?

Looking at the dosage file, it seems I get 0.000% presense hit on all HLA antigens, which I find very weird. But I'm definitely seeing some imputed SNPs that aren't part of the genome.

The snp2hla used to contain a large reference panel (T1DGC), but they've removed that from the net because of security (or privacy, I guess).

So my questions are: Is it at all possible to detect the presence of HLA-B27 from a 23andMe genome using a reference panel (I guess it should), and do you have any idea if I'm doing something wrong?

Any hints will be appreciated!

Link to snp2hla: https://www.broadinstitute.org/mpg/snp2hla/

plink imputation SNP 23andme genome • 7.5k views

ADD COMMENT • link updated 20 months ago by Ram 43k • written 8.5 years ago by cslarsen • 0

0

Entering edit mode

HIBAG is an HLA genotype imputation tool, HIBAG can be used by researchers with published parameter estimates (http://www.biostat.washington.edu/~bsweir/HIBAG/ ) instead of requiring access to large training sample datasets.

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.4 years ago by zhengxwen • 0

Ram · Answer 1 · 2015-11-11

1

Entering edit mode

8.5 years ago

Lemire ▴ 940

A while back I wrote a paper on the topic, but I haven't touched on the subject since. But right now I can point to reference 8; in there you will find a Supplemental Table that includes a list of SNPs and alleles that can be used to predict the classical HLA alleles, as well as r2 values between SNP alleles and HLA alleles. If you would have a reference panel, that would be best, but in absence of it what I just wrote could be a decent starting point for you.

ADD COMMENT • link updated 4.4 years ago by Ram 43k • written 8.5 years ago by Lemire ▴ 940

0

Entering edit mode

The snp2hla program does come with a reference panel, but I'm still struggling to find _any_ HLA antigens in the output. I've had a look at the table you mentioned, but am currently scratching my head. I've found some relevant SNPs on SNPedia related to disease/condition phenotypes, but that's not really what I'm after. I just want to predict the presence or absence of the HLA-B*27 antigen (possibly HLA-B*5101 as well).

ADD REPLY • link 8.5 years ago by cslarsen • 0

0

Entering edit mode

I ran the test example provided with the package. From

% grep HLA_B_2705 1958BC_IMPUTED.bgl.phased
M HLA_B_2705 A A A A A A A A A P A A A A A A A A A A

you see that the 5th individual has "genotype" A P (columns 11 and 12; two columns per patient for genotypes). The B*2705 allele is thus inferred to be present (P) in one copy.

Now to get an idea of the uncertainty, you may want to do

% grep HLA_B_2705 1958BC_IMPUTED.bgl.gprobs
HLA_B_2705 P A 0.000 0.076 0.924 0.000 0.000 1.000 0.000 0.000 1.000 0.000 0.049 0.951 0.000 0.592 0.407 0.000 0.082 0.918 0.000 0.100 0.900 0.000 0.000 1.000 0.000 0.000 1.000 0.000 0.050 0.950

These values indicate the probabilities for the 3 possible genotypes (PP: presence of 2 copies; AP: presence of 1 copy; and AA:absence; respectively. There are 3 values per individual). The fifth individual has values in columns 16, 17 and 18 (0.000 0.592 0.407), which indicates that the AP genotype (presence of one copy of B2705) has probability .592 and the AA genotype (absence of B2705) has probability .407. So there's still a high likelihood of misclassification.

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.5 years ago by Lemire ▴ 940

0

Entering edit mode

I think you're absolutely right. I've cross checked the same files, and they look correct here. I do get other combinations for other spots, so it looks alright. Thanks!

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.5 years ago by cslarsen • 0

score 0 · Answer 2 · 2016-07-25

Yes, there are 23andMe SNPs that can be used to impute HLA-B*27.

SNPedia HLA-B27 https://www.23andme.com/you/explorer/snp/?snp_name=rs13202464 https://www.23andme.com/you/explorer/snp/?snp_name=rs4349859

Hopefully this will be helpful to you and everyone else who thinks they are HLA-B27 +. Also, everyone can check HLA-B51 as well, because that is the other major potentially pathogenic HLA-B allele:

https://www.23andme.com/you/explorer/snp/?snp_name=rs79556279