Question: VCFtools. Get rid of DS:GP phased genotypes while keeping GT and remaining fields
0
gravatar for Mr Locuace
6 months ago by
Mr Locuace90
Chile
Mr Locuace90 wrote:

Hello, I have a question about VCFtools

I have a vcf file with the usual 9 columns in addition to phased data of several samples. The phased data is in this format: GT:DS:GP (e.g., 0|0:0:1,0,0). I would like to get the original vcf file but only with the GT genotypes (0|0).

With the VCFtools (v0.1.14) command "--extract-FORMAT-info GT" I get the GT genotypes but only CHROM and POS columns.

If someone knows how to do this with this or another software it would be very helpful. Thank you

vcftools • 429 views
ADD COMMENTlink modified 6 months ago • written 6 months ago by Mr Locuace90
4
gravatar for cpad0112
6 months ago by
cpad011211k
India
cpad011211k wrote:

Good description of requirements. It would help if you could post some example input data. try, to retain only GT from format field:

 bcftools annotate -x ^FORMAT/GT test.vcf

ps: could you please edit the title replacing read of with rid of

ADD COMMENTlink modified 6 months ago • written 6 months ago by cpad011211k

@ cpad0112 you should move it to an answer.

And this time I can show you how to do this (a bit) shorter :) :

$ bcftools annotate -x 'FORMAT' test.vcf

From the manual:

Similarly, "INFO" can be used to remove all INFO tags and "FORMAT" to remove all FORMAT tags except GT
ADD REPLYlink modified 6 months ago • written 6 months ago by finswimmer9.9k

You had me there @ finswimmer ...but let me shorten it further:

$ bcftools annotate -x 'fmt' test.vcf

btw, thanks for the bcftools trick/tip. finswimmer

ADD REPLYlink modified 6 months ago • written 6 months ago by cpad011211k
1
gravatar for finswimmer
6 months ago by
finswimmer9.9k
Germany
finswimmer9.9k wrote:

If awk is also fineyou can do it like this:

$ awk -v FS="\t" -v OFS="\t" '{for(i=9;i<=NF;i++) {split($i, gt, ":"); $i=gt[1]} print}' input.vcf > output.vcf

In each column from the FORMAT column until the end, awk splits the values in the column by : and replaces the old column value with only the first resulted value after splitting (which should be the genotype or GT in the FORMAT column).

fin swimmer

ADD COMMENTlink written 6 months ago by finswimmer9.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1633 users visited in the last hour