I try to filtering a vcf-file with SnpSift, but I find it somewhat hard to understand.
I have following vcf-headers:
. . . ##INFO=<ID=EFF,Number=.,Type=String,Description="Predicted effects for this variant.Format: 'Effect ( Effect_Impact | Functional_Class | Codon_Change | Amino_Acid_Change| Amino_Acid_length | Gene_Name | Transcript_BioType | Gene_Coding | Transcript_ID | Exon_Rank | Genotype [ | ERRORS | WARNINGS ] )' "> ##SnpEffCmd="SnpEff -lof GRCh37.75 pSS_refseq.vcf " ##INFO=<ID=ANN,Number=.,Type=String,Description="Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO' "> . . .
I tried for example to extract all "intergenic_region" effects by this:
SnpSift.jar filter "(EFF[*].EFFECT = 'intergenic_region')" testsample.vcf > genes2.vcf
However, this does not work if I only want to queries against the EFF-field, NOT ANN-field. I have observed that SnpSift instead try to search in the ANN-field instead of EFF-field, then no hits. In ANN-field I have "downstream_gene_variant" instead of "intergenic_region". Issuing the same command like this:
SnpSift.jar filter "(EFF[*].EFFECT = 'downstream_gene_variant')" testsample.vcf > genes2.vcf
Then it works, which obviously means that SnpSift is looking in the ANN-field.