SnpSift set intersection with EFF annotations
19 months ago
Richard ▴ 580

Hi all,

I have a set of genes for which I want to extract the VCF files where they are listed as being modified, for example, with a SNPEff impact of "HIGH".

To intersect my list of genes with a VCF file and extract this HIGH impact variants I can use this command:

java SnpSift.jar filter -s gene_list.txt  "(  (EFF[*].IMPACT = 'HIGH') & ANN[*].GENE in SET[0] )" my.vcf


This will extract lines that have EFF annotations for genes in my list that also have an annotation of HIGH impact. However, this can occasionally identify a variant of HIGH impact in a gene not in my list, but on the same line as a variant of a different impact for a gene in my list.

Is there a way to select the lines that contain a HIGH impact variant in one of my genes (ie. not just on a line where A gene has a HIGH impact variant that overlaps with one of my genes) ?

19 months ago
Richard ▴ 580

Found this script to split the VCF annotations into 1 per line and using the script above gets me where I need to be:

cat my.vcf | snpEff-4.3/scripts/vcfEffOnePerLine.pl | java -jar snpEff-4.3/SnpSift.jar filter -s gene_list_20200218.txt "(  (EFF[*].IMPACT = 'HIGH') & ANN[*].GENE in SET[0] )"