Question: Is there a tool that can output the raw genotype call (i.e., 0/1) rather than the actual basecall (A/T) from a VCF file?
1
gravatar for mmats010
2.6 years ago by
mmats01060
mmats01060 wrote:

I'm interested in getting simple "heterozygous" or "homozygous" designations for all of the samples/SNPs in my multisample VCF file. In the past, I have been using the -GF GT option in GATK's VariantsToTable tool, and then annotating my basecalls in Excel as either heterozygous or homozygous. This takes forever since Excel isn't really built for big data like this. Is there a simple way to output all of the SNPs as 0/1, 0/0, 0/1, or 1/1 instead of C/A, A/A, G/T, C/C? My ideal output would be a txt file in a grid similar to how VariantsToTable outputs data: top row is each sample, while first column is the variant coordinates.

ADD COMMENTlink modified 2.6 years ago by Jorge Amigo11k • written 2.6 years ago by mmats01060

Isn't that pretty close to how a vcf file naturally looks?

ADD REPLYlink written 2.6 years ago by swbarnes25.8k

This takes forever since Excel isn't really built for big data like this.

That's a bit of an understatement. Good that you try to find an alternative!

ADD REPLYlink written 2.6 years ago by WouterDeCoster39k
4
gravatar for Jorge Amigo
2.6 years ago by
Jorge Amigo11k
Santiago de Compostela, Spain
Jorge Amigo11k wrote:

if you start from a valid vcf file, you can get your desired output simply with this command:

grep -v ^## input.vcf | cut -f1,2,10- | sed 's/:\S*//g'

grep to remove all headers but column names, cut to select chromosome+position+samples' columns, and sed to remove everything but GT from genotype columns.

ADD COMMENTlink written 2.6 years ago by Jorge Amigo11k

Thanks, this seems like the quickest method. I only worry because The Broad's documentation on generating tables from VCF files warns very sternly about not using a dedicated tool to parse out a VCF file

No, really, don't write your own parser if you can avoid it. This is not a comment on how smart or how competent we think you are -- it's a comment on how annoyingly obtuse and convoluted the VCF format is.

However, my VCF file seems valid and I dont see any weird outputs, so thanks again.

ADD REPLYlink written 2.6 years ago by mmats01060
1
gravatar for WouterDeCoster
2.6 years ago by
Belgium
WouterDeCoster39k wrote:

You are looking for a conversion to plink format, which you can do with VCFtools --plink, see this page

ADD COMMENTlink written 2.6 years ago by WouterDeCoster39k
0
gravatar for harold.smith.tarheel
2.6 years ago by
United States
harold.smith.tarheel4.4k wrote:

VCFtools and VCFlib support filtering by genotype.

ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by harold.smith.tarheel4.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 801 users visited in the last hour