Question: VCFtools. Get rid of DS:GP phased genotypes while keeping GT and remaining fields
0
gravatar for Mr Locuace
15 months ago by
Mr Locuace90
Chile
Mr Locuace90 wrote:

Hello, I have a question about VCFtools

I have a vcf file with the usual 9 columns in addition to phased data of several samples. The phased data is in this format: GT:DS:GP (e.g., 0|0:0:1,0,0). I would like to get the original vcf file but only with the GT genotypes (0|0).

With the VCFtools (v0.1.14) command "--extract-FORMAT-info GT" I get the GT genotypes but only CHROM and POS columns.

If someone knows how to do this with this or another software it would be very helpful. Thank you

vcftools • 971 views
ADD COMMENTlink modified 15 months ago • written 15 months ago by Mr Locuace90
4
gravatar for cpad0112
15 months ago by
cpad011212k
India
cpad011212k wrote:

Good description of requirements. It would help if you could post some example input data. try, to retain only GT from format field:

 bcftools annotate -x ^FORMAT/GT test.vcf

ps: could you please edit the title replacing read of with rid of

ADD COMMENTlink modified 15 months ago • written 15 months ago by cpad011212k

@ cpad0112 you should move it to an answer.

And this time I can show you how to do this (a bit) shorter :) :

$ bcftools annotate -x 'FORMAT' test.vcf

From the manual:

Similarly, "INFO" can be used to remove all INFO tags and "FORMAT" to remove all FORMAT tags except GT
ADD REPLYlink modified 15 months ago • written 15 months ago by finswimmer12k

You had me there @ finswimmer ...but let me shorten it further:

$ bcftools annotate -x 'fmt' test.vcf

btw, thanks for the bcftools trick/tip. finswimmer

ADD REPLYlink modified 15 months ago • written 15 months ago by cpad011212k
1
gravatar for finswimmer
15 months ago by
finswimmer12k
Germany
finswimmer12k wrote:

If awk is also fineyou can do it like this:

$ awk -v FS="\t" -v OFS="\t" '{for(i=9;i<=NF;i++) {split($i, gt, ":"); $i=gt[1]} print}' input.vcf > output.vcf

In each column from the FORMAT column until the end, awk splits the values in the column by : and replaces the old column value with only the first resulted value after splitting (which should be the genotype or GT in the FORMAT column).

fin swimmer

ADD COMMENTlink written 15 months ago by finswimmer12k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1119 users visited in the last hour