I have a VCF file with ~30K sites across 131 samples. I am trying to make it include only variant sites, meaning I want to exclude loci where all of my 131 samples have the same genotype, regardless of what the reference allele is. I used GATK SelectVariants with the -env tag, but that only excludes sites where all samples are 0/0, not sites where all samples are 1/1 (homozygous reference.)
I am a pretty terrible coder and struggle to modify VCF files.
My question is: Does anybody have a script or know of a tool that can remove the entire site (line) if all 131 samples (columns?) have 1/1 in the genotype position? Or more generally, if all samples have the same genotype at that site, whether it be 0/0, 0/1, or 1/1 (GATK can do the 0/0 and 0/1, but if it's easier to kill 3 birds with one stone then no problem).