I have a set of genes for which I want to extract the VCF files where they are listed as being modified, for example, with a SNPEff impact of "HIGH".
To intersect my list of genes with a VCF file and extract this HIGH impact variants I can use this command:
java SnpSift.jar filter -s gene_list.txt "( (EFF[*].IMPACT = 'HIGH') & ANN[*].GENE in SET )" my.vcf
This will extract lines that have EFF annotations for genes in my list that also have an annotation of HIGH impact. However, this can occasionally identify a variant of HIGH impact in a gene not in my list, but on the same line as a variant of a different impact for a gene in my list.
Is there a way to select the lines that contain a HIGH impact variant in one of my genes (ie. not just on a line where A gene has a HIGH impact variant that overlaps with one of my genes) ?