Question

Is there a tool that can output the raw genotype call (i.e., 0/1) rather than the actual basecall (A/T) from a VCF file?

3

Entering edit mode

8.7 years ago

mmats010 ▴ 80

I'm interested in getting simple "heterozygous" or "homozygous" designations for all of the samples/SNPs in my multisample VCF file. In the past, I have been using the -GF GT option in GATK's VariantsToTable tool, and then annotating my basecalls in Excel as either heterozygous or homozygous. This takes forever since Excel isn't really built for big data like this. Is there a simple way to output all of the SNPs as 0/1, 0/0, 0/1, or 1/1 instead of C/A, A/A, G/T, C/C? My ideal output would be a txt file in a grid similar to how VariantsToTable outputs data: top row is each sample, while first column is the variant coordinates.

gatk variantstotable SNP genotyping • 3.8k views

ADD COMMENT • link updated 8.7 years ago by Jorge Amigo 14k • written 8.7 years ago by mmats010 ▴ 80

0

Entering edit mode

Isn't that pretty close to how a vcf file naturally looks?

ADD REPLY • link 8.7 years ago by swbarnes2 15k

0

Entering edit mode

This takes forever since Excel isn't really built for big data like this.

That's a bit of an understatement. Good that you try to find an alternative!

ADD REPLY • link 8.7 years ago by WouterDeCoster 48k

2

Entering edit mode

8.7 years ago

WouterDeCoster 48k

You are looking for a conversion to plink format, which you can do with VCFtools --plink, see this page

ADD COMMENT • link 8.7 years ago by WouterDeCoster 48k

1

Entering edit mode

8.7 years ago

harold.smith.tarheel ★ 5.0k

VCFtools and VCFlib support filtering by genotype.

ADD COMMENT • link 8.7 years ago by harold.smith.tarheel ★ 5.0k

score 7 · Accepted Answer · 2016-11-01

7

Entering edit mode

8.7 years ago

Jorge Amigo 14k

if you start from a valid vcf file, you can get your desired output simply with this command:

grep -v ^## input.vcf | cut -f1,2,10- | sed 's/:\S*//g'

grep to remove all headers but column names, cut to select chromosome+position+samples' columns, and sed to remove everything but GT from genotype columns.

ADD COMMENT • link 8.7 years ago by Jorge Amigo 14k

0

Entering edit mode

Thanks, this seems like the quickest method. I only worry because The Broad's documentation on generating tables from VCF files warns very sternly about not using a dedicated tool to parse out a VCF file

No, really, don't write your own parser if you can avoid it. This is not a comment on how smart or how competent we think you are -- it's a comment on how annoyingly obtuse and convoluted the VCF format is.

However, my VCF file seems valid and I dont see any weird outputs, so thanks again.

ADD REPLY • link 8.7 years ago by mmats010 ▴ 80