set GT values to missing in VCF file for specific sample-variant combinations - bcftools?
1
0
Entering edit mode
5 months ago
karen2 • 0

Hi - I have a multi-subject vcf file and would like to set specific genotypes (GT) to missing for a set of subjects. However, the subjects that I need to set to missing are different for each variant.

For example, suppose I have this:

CHROM POS ID FORMAT sub1 sub2 sub3

22 12345 rs1111 GT 0/0 0/1 0/1

22 67891 rs2222 GT 1/1 0/0 0/0

22 45678 rs3333 GT 0/0 0/0 0/0

22 34567 rs4444 GT 0/1 0/0 0/1

I want to set rs1111 to missing for sub1, but then set rs3333 to missing for sub2 and sub3, so my results would look like this:

CHROM POS ID FORMAT sub1 sub2 sub3

22 12345 rs1111 GT ./. 0/1 0/1

22 67891 rs2222 GT 1/1 0/0 0/0

22 45678 rs3333 GT 0/0 ./. ./.

22 34567 rs4444 GT 0/1 0/0 0/1

I prefer using bcftools or vcftools, but am open to other ideas!

bcftools missing GT vcf • 435 views
ADD COMMENT
1
Entering edit mode
5 months ago

using vcffilterjdk https://jvarkit.readthedocs.io/en/latest/VcfFilterJdk/ , for your first rs, not tested.

 java -jar  varkit.jar vcffilterjdk -e 'if("rs3333".equals(variant.getID())) {return new VariantContextBuilder(variant).genotypes(variant.getGenotypes().stream().map(G->(G.getSampleName().equals("sub2") || G.getSampleName().equals("sub3"))?GenotypeBuilder.createMissing(G.getSampleName(),G.getPloidy()):G ).collect(Collectors.toList())).make();} return variant;'  input.vcf
ADD COMMENT
0
Entering edit mode

Wow, thanks! It took me a few days to get this installed and running, but it is doing something close to what I need now. Thank you so much!

I am not very famil1ar with java, so I hope you don't mind me asking a couple follow up questions:

  1. So in my 'real life' VCF, I have other fields in FORMAT besides GT. For example, I have GT:GQ:SQ:DP:CN. I would like only the GT column to be set to ./. but the other format fields to remain the same. Right now, all fields are being set to missing. For example, I would like 0/0:51:0:57:2 to be changed only to ./.:51:0:57:2 instead of ./.:.:.:.. I assume something in the java expression needs to change, but I am not sure what.

  2. In my 'real life' VCF, I have about ~500 variants that each need to be set to missing for between 1-20 subjects. I could use your command ~500 times, each time creating a new output VCF, but I wonder if there is a way to do all of this in the same run of vcffilterjdk without having just an extremely long -e command. Any thoughts?

Again, thanks very much for your help!

ADD REPLY

Login before adding your answer.

Traffic: 1699 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6