Hi,
I try to filtering a vcf-file with SnpSift, but I find it somewhat hard to understand.
I have following vcf-headers:
.
.
.
##INFO=<ID=EFF,Number=.,Type=String,Description="Predicted effects for this variant.Format: 'Effect ( Effect_Impact | Functional_Class | Codon_Change | Amino_Acid_Change| Amino_Acid_length | Gene_Name | Transcript_BioType | Gene_Coding | Transcript_ID | Exon_Rank | Genotype [ | ERRORS | WARNINGS ] )' ">
##SnpEffCmd="SnpEff -lof GRCh37.75 pSS_refseq.vcf "
##INFO=<ID=ANN,Number=.,Type=String,Description="Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO' ">
.
.
.
I tried for example to extract all "intergenic_region" effects by this:
SnpSift.jar filter "(EFF[*].EFFECT = 'intergenic_region')" testsample.vcf > genes2.vcf
However, this does not work if I only want to queries against the EFF-field, NOT ANN-field. I have observed that SnpSift instead try to search in the ANN-field instead of EFF-field, then no hits. In ANN-field I have "downstream_gene_variant" instead of "intergenic_region". Issuing the same command like this:
SnpSift.jar filter "(EFF[*].EFFECT = 'downstream_gene_variant')" testsample.vcf > genes2.vcf
Then it works, which obviously means that SnpSift is looking in the ANN-field.
Any ideas?
Thank you. But is SnpSift limited to pre-defined VCF-fields? What if I change EFF to EFF_Custom, would it be possible to do a search in that field instead, or more general, how does SnpSift search in custom-made fields?
in the new version of SNPEFF, the ANN field is following a specification: http://snpeff.sourceforge.net/VCFannotationformat_v1.0.pdf, the old EFF fields doesn't look the same at all; so you cannot use one tool with the other format.
Well, this sucks, period.
I'm going back to Python...