Is there a tool that can output the raw genotype call (i.e., 0/1) rather than the actual basecall (A/T) from a VCF file?
3
3
Entering edit mode
5.9 years ago
mmats010 ▴ 80

I'm interested in getting simple "heterozygous" or "homozygous" designations for all of the samples/SNPs in my multisample VCF file. In the past, I have been using the -GF GT option in GATK's VariantsToTable tool, and then annotating my basecalls in Excel as either heterozygous or homozygous. This takes forever since Excel isn't really built for big data like this. Is there a simple way to output all of the SNPs as 0/1, 0/0, 0/1, or 1/1 instead of C/A, A/A, G/T, C/C? My ideal output would be a txt file in a grid similar to how VariantsToTable outputs data: top row is each sample, while first column is the variant coordinates.

gatk variantstotable SNP genotyping • 2.6k views
0
Entering edit mode

Isn't that pretty close to how a vcf file naturally looks?

0
Entering edit mode

This takes forever since Excel isn't really built for big data like this.

That's a bit of an understatement. Good that you try to find an alternative!

6
Entering edit mode
5.9 years ago

if you start from a valid vcf file, you can get your desired output simply with this command:

grep -v ^## input.vcf | cut -f1,2,10- | sed 's/:\S*//g'


grep to remove all headers but column names, cut to select chromosome+position+samples' columns, and sed to remove everything but GT from genotype columns.

0
Entering edit mode

Thanks, this seems like the quickest method. I only worry because The Broad's documentation on generating tables from VCF files warns very sternly about not using a dedicated tool to parse out a VCF file

No, really, don't write your own parser if you can avoid it. This is not a comment on how smart or how competent we think you are -- it's a comment on how annoyingly obtuse and convoluted the VCF format is.

However, my VCF file seems valid and I dont see any weird outputs, so thanks again.

2
Entering edit mode
5.9 years ago

You are looking for a conversion to plink format, which you can do with VCFtools --plink, see this page

1
Entering edit mode
5.9 years ago

VCFtools and VCFlib support filtering by genotype.