vcf procesing for ML antigen prediction
0
0
Entering edit mode
14 days ago
Javier • 0

Hi, I'm working on a project where we aim to predict the immunogenicity of neoantigens from patients with pancreatic cancer cohorts from the ICGC. We only have access to the VCF files of the patients, from which I extracted the SNV and indel variants for each patient, filtering them to retain only the variants affecting protein-coding genes (I have retained the unfiltered ones as well).

Currently, I'm facing a problem. To obtain the necessary features for the ML model, I need the HLA haplotypes from the patients. However, to do this, WGS/WES data is required, which we don't have access to (we can't handle the BAM files due to memory issues). I considered using bcftools consensus to generate FASTA files from the VCFs, but HLA typing software requires reads (FASTQ or BAM files), and I haven't found a way to obtain those types of files from VCFs.

Then, I learned about HLA imputation, but it also requires formats like BED, BIM, FIM.... which I haven't found a way to obtain from VCF files.

So, my question is, is there a way to obtain the HLA haplotype from just VCF files?

formats vcf fastaq HLA_imputation HLA_typing • 147 views
ADD COMMENT

Login before adding your answer.

Traffic: 1309 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6