Fixing genotypes from split vcf
0
0
Entering edit mode
3.8 years ago
graeme.thorn ▴ 100

I have split a multi-sample vcf into each sample, and I was wondering if there was a simple method for fixing the genotype for variants to 0/1 for each single-sample vcf, and changing the alternate allele to match the new genotype. For instance, if the vcf contains a read like

chr1 945122 . C A,T . PASS <INFO STRING> GT:ABQ:AD:ADF:ADR:DP:FREQ:GQ:PVAL:RBQ:RD:RDF:RDR:SDP 0/2:57:4:1:3:584:0.68%:0:9.8E-1:50:578:449:129:584

then I would like it to read

chr1 945122 . C T . PASS <INFO STRING> GT:ABQ:AD:ADF:ADR:DP:FREQ:GQ:PVAL:RBQ:RD:RDF:RDR:SDP 0/1:57:4:1:3:584:0.68%:0:9.8E-1:50:578:449:129:584

so that the alt allele and the genotype still match, just no other alternate alleles are in the VCF.

Is there a tool for tidying a vcf up like this?

vcf • 825 views
ADD COMMENT
0
Entering edit mode

Isn't this just a bcftools norm -m-any solution?

ADD REPLY
0
Entering edit mode

Yeah, if done before splitting. Also, I'd recommend vt decompose over bcftools norm -m-any - vt retains information on the variants it "transforms".

ADD REPLY
0
Entering edit mode

You should split multi-allelic sites before splitting VCFs into sample-specific VCFs, if such "clean" genotypes are a necessity.

Use vt decompose to split multi-allelics, then any tool of your choice to get single sample VCFs.

By the way, an entry in a VCF file is a site/location/variant, not a "read".

ADD REPLY

Login before adding your answer.

Traffic: 2927 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6