What is the current standard for HLA allele typing prediction from SNP data?
2
2
Entering edit mode
4.2 years ago
sidtjn ▴ 20

I am currently exploring HIBAG (https://doi.org/10.1038/tpj.2013.18), but I was wondering if this R package is considered the best way to predict HLA allele from SNP data. I know that HLA-check exists (https://doi.org/10.1186/s12859-017-1746-1), but I haven't started reading through the paper.

The population that I'm working with is from Southeast Asia. I am also relatively new to Bioinformatics (~6 months).

SNP R • 1.5k views
ADD COMMENT
0
Entering edit mode

Hope to see some update to this question. Is there any better solution has been constructed?

ADD REPLY
0
Entering edit mode

So I actually don't think there is a clear "leader" among the popular tools. Of interest might be that this is one of the most recent pipelines from a major group in this space: https://github.com/immunogenomics/HLA-TAPAS

ADD REPLY
3
Entering edit mode
4.2 years ago

I am not sure if "standard" is exactly the right word, even though I can imagine that is what a lot of people may use.

However, you can see the results and concordance for my own SNP chip and sequencing data (with 2 methods each; SNP2HLA and HIBAG for SNP chip; bwakit and HLAminer for Illumina sequencing):

http://cdwscience.blogspot.com/2019/08/predicting-hla-types-for-array-and-high.html

I think the HLA-A, HLA-B, and HLA-C results were more robust. So, aside from one example where 2 SNPs did a better job of matching the sequencing results (compared to the microarray imputation strategies), I am not exactly sure what to tell you about the other HLA genes.

ADD COMMENT
2
Entering edit mode

So am I reading correctly that your sequence based calls were from "Genos Exome", this is just short read whole exome sequencing correct? Then you did analysis with HLAminer? How well does this work? Is WES sufficient for accurate calls? Would you consider the NGS based calls the "truth set" over the SNP array based calls? I was reading that long-read sequencing is the way to go with HLA sequencing, I am new to the topic though.

ADD REPLY
2
Entering edit mode

It would probably be best to look for similar results in more samples.

However, what I can say is that different methods had similar results for the HLA-A, HLA-B, and HLA-C genes for myself, but I got different results for the HLA-D genes.

If you consider the HLA types defined from the 2 SNPs, that means the Exome / WGS data may be better than the microarray imputations (and it also means that 23andMe decided to use those 2 SNP over the microarray imputations in what they return to customers, even though there are some papers that might use the imputations).

For some applications, I have seen some advantages for the longer read data. However, I don't have longer read sequencing for myself. Also, if you use amplicons, you need to be careful about certain sequences that may be completely missed from the targeted sequencing (and I think long-read WGS is noticeably more expensive).

The "truth" dataset can sometimes be tricky to define (and I think continual collection of results is probably very important, not just a test with a limited sample size), and I would guess the performance can vary between applications. However, if you look at consistency, there were certain HLA types (A/B/C) that were more likely to be consistent among the 4 strategies that I tested (and I would expect those are probably more robust if you continued to test strategies).

ADD REPLY
2
Entering edit mode
4.2 years ago

I would recommend taking a look here, and seeing what you can find: https://omictools.com/search?q=hla

These HLA tools have a tendency to go out of use / go un-maintained.

In the past, I have used Polysolver and HLASscan, but these work from FASTQ / NGS reads.

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 2504 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6