Question: Fixing genotypes from split vcf
0
gravatar for graeme.thorn
3 months ago by
graeme.thorn50
London, United Kingdom
graeme.thorn50 wrote:

I have split a multi-sample vcf into each sample, and I was wondering if there was a simple method for fixing the genotype for variants to 0/1 for each single-sample vcf, and changing the alternate allele to match the new genotype. For instance, if the vcf contains a read like

chr1 945122 . C A,T . PASS <INFO STRING> GT:ABQ:AD:ADF:ADR:DP:FREQ:GQ:PVAL:RBQ:RD:RDF:RDR:SDP 0/2:57:4:1:3:584:0.68%:0:9.8E-1:50:578:449:129:584

then I would like it to read

chr1 945122 . C T . PASS <INFO STRING> GT:ABQ:AD:ADF:ADR:DP:FREQ:GQ:PVAL:RBQ:RD:RDF:RDR:SDP 0/1:57:4:1:3:584:0.68%:0:9.8E-1:50:578:449:129:584

so that the alt allele and the genotype still match, just no other alternate alleles are in the VCF.

Is there a tool for tidying a vcf up like this?

vcf • 89 views
ADD COMMENTlink modified 3 months ago • written 3 months ago by graeme.thorn50

Isn't this just a bcftools norm -m-any solution?

ADD REPLYlink written 3 months ago by Kevin Blighe65k

Yeah, if done before splitting. Also, I'd recommend vt decompose over bcftools norm -m-any - vt retains information on the variants it "transforms".

ADD REPLYlink modified 3 months ago • written 3 months ago by RamRS30k

You should split multi-allelic sites before splitting VCFs into sample-specific VCFs, if such "clean" genotypes are a necessity.

Use vt decompose to split multi-allelics, then any tool of your choice to get single sample VCFs.

By the way, an entry in a VCF file is a site/location/variant, not a "read".

ADD REPLYlink written 3 months ago by RamRS30k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1891 users visited in the last hour