Hi all,
I have a set of genes for which I want to extract the VCF files where they are listed as being modified, for example, with a SNPEff impact of "HIGH".
To intersect my list of genes with a VCF file and extract this HIGH impact variants I can use this command:
java SnpSift.jar filter -s gene_list.txt "( (EFF[*].IMPACT = 'HIGH') & ANN[*].GENE in SET[0] )" my.vcf
This will extract lines that have EFF annotations for genes in my list that also have an annotation of HIGH impact. However, this can occasionally identify a variant of HIGH impact in a gene not in my list, but on the same line as a variant of a different impact for a gene in my list.
Is there a way to select the lines that contain a HIGH impact variant in one of my genes (ie. not just on a line where A gene has a HIGH impact variant that overlaps with one of my genes) ?