You can do exactly this with vcffilter in vcflib!
Here's how to select all variants with depth greater than 10, mapping quality greater than 30, and QD greater than 20:
vcffilter -f "DP > 10 & MQ > 30 & QD > 20" file.vcf >filtered.vcf
Now, to select only variants with homozygotes, you can strip every genotype that's not homozygous, fix up the file's AC and AF fields using the genotypes with vcffixup, and then remove all the AC = 0 sites (again, using vcffilter).
cat filtered.vcf | vcffilter -g "GT = 1/1" | vcffixup - | vcffilter -f "AC > 0" >results.vcf
The expression language is clunky (you have to put spaces in between the tokens, and parenthetical expressions also have to have spaces). There is also no != symbol, but as a workaround you can do ! ( expression ).
For instance, to pick up non-homozygous genotypes, you'd use:
vcffilter -g "! ( GT = 1/1 )"
I'd like to fix some of these things (and also add regex matching for strings) but this far it more than does the job for quick filtering operations, allowing me to do virtually any kind of filtering from the command line without having to drop into writing a custom script.
These are the supported operations: > < = | & !, and symbols: ( ). Strings are interpreted literally. There is some type checking using the VCF header, so you have to have a valid VCF file. The output is a valid VCF file, so you can stream the filter results into another filtering operation.
I just tried
egrep '^#|"GT =1/1" | "DP>10","MQ>30"' my.vcf > filtered.vcf
Didn't work though.
I need to filter my vcf file to include variants with at least 30 individuals in each of the possible groups: major allele homozygote, heterozygote, and minor allele homozygotes; would be grateful for any input. Thanks!
ask this as a new question please.