QUESTION: Missing genotype for Beagle imputation
0
2
Entering edit mode
4.7 years ago
Lily Wang ▴ 20

Hi Biostars,

I'm trying to impute missing data in my vcf file with variants from 141 samples called by Freebayes with Beagle 4 (the very recent version, released on Jan 21, 2017) for my downstream GWAS, and it gave me the error:

"ERROR: Missing one or both alleles for a genotype:"

I tried to use only biallelic SNPs since there is a post mentioned Beagle can only handle SNP variants, it still did not work. I also tried a test vcf file with variants I called from one chromosome by GATK Haplotypecaller, it showed the same error. This is the same question as Missing alleles for a genotype in UYG VCF file, however, I tried both way and it did not work out. The vcffilterjs has moved to picard, and if I use

\$ java -jar picard.jar FilterVcf -e 'function accept(v) { var i;for(i=0;i< v.getNSamples();i++) { var g=v.getGenotype(i); if(g.isCalled() && g.getAlleles().size()!=2) return false;} return true;} accept(variant);' test.vcf > test.rm.vcf


It only shows options for FilterVcf.

The second suggestion doesn't work for both my files (called by freebayes and GATK), it resulted only the head lines of vcf file.

In my vcf file, freebayes calling line is like this (with multiple samples): Pf3D7_04_v3 3 . C A 1.79263 . AB=0;ABP=0;AC=5;AF=0.0943396;AN=53;AO=7;CIGAR=1X;DP=85;DPB=85;DPRA=1.41435;EPP=10.7656;EPPR=56.9074;GTI=4;LEN=1;MEANALT=1;MQM=19.4286;MQMR=18.0128;NS=53;NUMALT=1;ODDS=0.677882;PAIRED=1;PAIREDR=0.897436;PAO=0;PQA=0;PQR=0;PRO=0;QA=249;QR=2669;RO=78;RPL=0;RPP=18.2106;RPPR=172.385;RPR=7;RUN=1;SAF=6;SAP=10.7656;SAR=1;SRF=61;SRP=56.9074;SRR=17;TYPE=snp GT:DP:AD:RO:QR:AO:QA:GL 0:1:1,0:1:37:0:0:0,-1.29836 0:1:1,0:1:40:0:0:0,-2.48652 1:1:0,1:0:0:1:35:-0.799271,0 (and more)

And here is the sample in vcf file by GATK (only from one sample as a test):

Pf3D7_07_v3 1951    .   G   T   113 .   AC=1;AF=1.00;AN=1;BaseQRankSum=-0.792;ClippingRankSum=0.000;DP=15;FS=0.000;MLEAC=1;MLEAF=1.00;MQ=32.74;MQRankSum=0.319;QD=14.13;ReadPosRankSum=-0.921;SOR=0.693 GT:AD:DP:GQ:PL  1:2,6:8:99:143,0


This is my first post and apparently I haven't gotten the skill of writing a neat one. I'm a newbie of bioinformatics. I'm now stuck here. Can anyone please help me? Thank you.

Lily

software error • 4.3k views
2
Entering edit mode

I got the same error but the problem was the missing genotype field (GT) that is encoded as "." (single dot). I just fix this by replacing the GT field with "./." !

zcat input.vcf.gz | perl -pe "s/\s\.:/\t.\/.:/g" | bgzip -c >output.vcf.gz

0
Entering edit mode

Thank you Frédéric! I tried but still got the same error. I think I first need to figure out how to convert haploid data to homozygous diploid data, and them may use your method to replace the GT field; or I have to re-call the variants using '-ploidy 2' and filter out heterozygous variants and then impute.

0
Entering edit mode

hello, i have the same problem when imputing with beagle. In GT field of my vcf file i see "." instead of .|., and i tried with the command you mentioned here but still it remains . after using that command. Can you help me in this issue. thanks, vinay kumar reddy nannuru

0
Entering edit mode

there is no such option '-e' in Picard/FilterVcf. Please, read the doc.

0
Entering edit mode

Thank you Pierre. My bad, I directly copied the one from your answer. I think I need to use "JS=File" argument in Picard/FilterVcf. I wish I knew more on java, but I know nearly nothing. Can you please explain clearly how to prepare a such file? I don't really understand the explanation in doc. Thank you again!

0
Entering edit mode

just copy the script in a text file...

0
Entering edit mode

furthermore, you can still use vcffilterjs...

0
Entering edit mode

Hi Pierre, thank you! Picard/FilterVcf did work out for me, I think still I haven't got the right way. But as you said, I can still use vcffilterjs. It showed me "WARNING: NO CSQ found in header. This VCF was probably NOT annotated with VEP.". I annotated my vcf via snpEff. I will try VEP though. I also tried the one without annotation. While for both the results files only have head lines, everything else was filtered out.

0
Entering edit mode

this is just a warning (for people wanting to use VEP/SnpEff annotations).

everything else was filtered out.

so the javascript expression filter out everything...

0
Entering edit mode

the expression

 g.getAlleles().size()!=2


will filter out any variant having a called + non-diploid genotype so

 GT:AD:DP:GQ:PL  1:2,6:8:99:143,


will be filtered-out (GT=1 == haploid )

0
Entering edit mode

That makes a lot of sense. Mine is haploid data, and I used the -ploidy 1 option. I did try converting the haploid data to diploid using "bcftools convert -haploid2dipoid". Still Beagle cannot impute and your script will filter out everything. Do you have any idea how should I do? Thank you!!!

0
Entering edit mode

Correct. Only left the head lines with ## and #, in both files with and without annotation. I'm totally lost.