Missing alleles for a genotype in UYG VCF file
1
1
Entering edit mode
5.2 years ago
Neilfws 49k

I'm trying to use Beagle 4.1 to phase a VCF file from Illumina's UYG program, using 1000 Genomes phase 3 as a reference panel. It's failing with the error:

ERROR: Missing one or both alleles for a genotype:

Indeed, when I examine the VCF file I see lines like this one (scroll right):

chr12   1899470 .       C       T       239     PASS    SNVSB=-26.8;SNVHPOL=3;CSQ=T||NM_172364.4|Transcript|downstream_gene_variant|||||||||CACNA2D4|||||1653|YES||||NP_758952.4|||||,T||NM_024551.2|Transcript|downstream_gene_variant|||||||||ADIPOR2|||||1625|YES||||NP_078827.2|||||        GT:GQ:GQX:DP:DPF:AD     1:33:33:18:2:0,17

where the value for GT = 1.

Questions:

  • is GT = 1 valid VCF? I had the impression it was not
  • is there a smart way to make Beagle ignore these lines? I didn't see anything in the documentation
  • or a way to remove these lines in preprocessing using e.g. vcftools?
vcf uyg illumina genotyping • 2.4k views
ADD COMMENT
1
Entering edit mode

a way to remove these lines:

using vcffilterjs : https://github.com/lindenb/jvarkit/wiki/VCFFilterJS remove the lines having one genotype where num(alleles)!=2.

$ java -jar jvarkit/dist/vcffilterjs.jar -e 'function accept(v) { var i;for(i=0;i< v.getNSamples();i++) { var g=v.getGenotype(i); if(g.isCalled() && g.getAlleles().size()!=2) return false;} return true;} accept(variant);' input.vcf > out.vcf
ADD REPLY
1
Entering edit mode
5.2 years ago

well that's strange... anyway, you can remove "1" genotypes excluding them with bcftools

bcftools view -e 'GT="1"' file.vcf.gz
ADD COMMENT

Login before adding your answer.

Traffic: 2445 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6