Question: Q: Error converting vcf to hap
0
gravatar for anailis
7 weeks ago by
anailis0
Edinburgh
anailis0 wrote:

Hello. I am trying to convert a .vcf containing phased data to SHAPEIT .hap/.sample format using bcftools or Plink2.

bcftools convert input.vcf --hapsample output.hap

OR

plink2 --vcf input.vcf --export haps --out output.hap

PLINK gives the error "--exports haps cannot be used with missing genotype calls" and bcftools gives the error "FORMAT/GT tag not present at [chr.no]:[SNP:id]". My vcf is using genotype probabilities (GP) not genotype (GT). My .vcf was produced from a .bgen using qctools:

qctool -g input.bgen -s input.sample -incl-rsids snps.txt -incl-samples sample.txt -og output.vcf

A snapshot of my .vcf looks like:

##fileformat=VCFv4.2
##FORMAT=<ID=GP,Type=Float,Number=G,Description="Genotype call probabilities">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  [sample-id] ...
.   11944   rs10172629,2    C   T   .   .   .   GP  [sample-genotype] ...

Sample genotypes are in the form 0,1,0,1 etc where 0 and 1 refer to the possible alleles.

plink software error • 140 views
ADD COMMENTlink modified 7 weeks ago by chrchang5237.1k • written 7 weeks ago by anailis0

Added plink tag.

Do you need the GPs or would you prefer GTs? To obtain GTs, I think that you just need to add the threshold parameter to the qctool command.

Also, in place of qctool, you could try shapeit -convert (for producing the VCF)

ADD REPLYlink written 7 weeks ago by Kevin Blighe63k
2
gravatar for chrchang523
7 weeks ago by
chrchang5237.1k
United States
chrchang5237.1k wrote:

Two issues:

  • When importing a VCF, plink2's default behavior is to just look at the GT field, since (i) that corresponds to what most researchers use plink for, and (ii) there are several different ways to represent dosages and genotype posterior probabilities, and those standards are continuing to evolve. You need to replace "--vcf input.vcf" with "--vcf input.vcf dosage=GP" to import GP values instead.
  • The .hap file format requires every single genotype call to be phased. GP cannot represent genotype phase at all! So you'll either need to actually run e.g. SHAPEIT4 to phase your data first, or you need to obtain a different file to start with.
ADD COMMENTlink written 7 weeks ago by chrchang5237.1k

Thank you for your response. I think I should have been working with GT.

ADD REPLYlink written 7 weeks ago by anailis0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1552 users visited in the last hour