Question: Fixing genotypes from split vcf
gravatar for graeme.thorn
3 months ago by
London, United Kingdom
graeme.thorn50 wrote:

I have split a multi-sample vcf into each sample, and I was wondering if there was a simple method for fixing the genotype for variants to 0/1 for each single-sample vcf, and changing the alternate allele to match the new genotype. For instance, if the vcf contains a read like

chr1 945122 . C A,T . PASS <INFO STRING> GT:ABQ:AD:ADF:ADR:DP:FREQ:GQ:PVAL:RBQ:RD:RDF:RDR:SDP 0/2:57:4:1:3:584:0.68%:0:9.8E-1:50:578:449:129:584

then I would like it to read

chr1 945122 . C T . PASS <INFO STRING> GT:ABQ:AD:ADF:ADR:DP:FREQ:GQ:PVAL:RBQ:RD:RDF:RDR:SDP 0/1:57:4:1:3:584:0.68%:0:9.8E-1:50:578:449:129:584

so that the alt allele and the genotype still match, just no other alternate alleles are in the VCF.

Is there a tool for tidying a vcf up like this?

vcf • 89 views
ADD COMMENTlink modified 3 months ago • written 3 months ago by graeme.thorn50

Isn't this just a bcftools norm -m-any solution?

ADD REPLYlink written 3 months ago by Kevin Blighe65k

Yeah, if done before splitting. Also, I'd recommend vt decompose over bcftools norm -m-any - vt retains information on the variants it "transforms".

ADD REPLYlink modified 3 months ago • written 3 months ago by RamRS30k

You should split multi-allelic sites before splitting VCFs into sample-specific VCFs, if such "clean" genotypes are a necessity.

Use vt decompose to split multi-allelics, then any tool of your choice to get single sample VCFs.

By the way, an entry in a VCF file is a site/location/variant, not a "read".

ADD REPLYlink written 3 months ago by RamRS30k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1891 users visited in the last hour