VCFtools. Get rid of DS:GP phased genotypes while keeping GT and remaining fields
2
0
Entering edit mode
5.7 years ago
Mr Locuace ▴ 160

Hello, I have a question about VCFtools

I have a vcf file with the usual 9 columns in addition to phased data of several samples. The phased data is in this format: GT:DS:GP (e.g., 0|0:0:1,0,0). I would like to get the original vcf file but only with the GT genotypes (0|0).

With the VCFtools (v0.1.14) command "--extract-FORMAT-info GT" I get the GT genotypes but only CHROM and POS columns.

If someone knows how to do this with this or another software it would be very helpful. Thank you

VCFtools • 5.4k views
ADD COMMENT
5
Entering edit mode
5.7 years ago

Good description of requirements. It would help if you could post some example input data. try, to retain only GT from format field:

 bcftools annotate -x ^FORMAT/GT test.vcf

ps: could you please edit the title replacing read of with rid of

ADD COMMENT
0
Entering edit mode

@ cpad0112 you should move it to an answer.

And this time I can show you how to do this (a bit) shorter :) :

$ bcftools annotate -x 'FORMAT' test.vcf

From the manual:

Similarly, "INFO" can be used to remove all INFO tags and "FORMAT" to remove all FORMAT tags except GT
ADD REPLY
0
Entering edit mode

You had me there @ finswimmer ...but let me shorten it further:

$ bcftools annotate -x 'fmt' test.vcf

btw, thanks for the bcftools trick/tip. finswimmer

ADD REPLY
1
Entering edit mode
5.7 years ago

If awk is also fineyou can do it like this:

$ awk -v FS="\t" -v OFS="\t" '{for(i=9;i<=NF;i++) {split($i, gt, ":"); $i=gt[1]} print}' input.vcf > output.vcf

In each column from the FORMAT column until the end, awk splits the values in the column by : and replaces the old column value with only the first resulted value after splitting (which should be the genotype or GT in the FORMAT column).

fin swimmer

ADD COMMENT

Login before adding your answer.

Traffic: 1540 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6