Question: SnpSift filtering help
0
gravatar for eXpander
3.6 years ago by
eXpander100
Sweden
eXpander100 wrote:

Hi,

I try to filtering a vcf-file with SnpSift, but I find it somewhat hard to understand.

I have following vcf-headers:

.
.
.
##INFO=<ID=EFF,Number=.,Type=String,Description="Predicted effects for this variant.Format: 'Effect ( Effect_Impact | Functional_Class | Codon_Change | Amino_Acid_Change| Amino_Acid_length | Gene_Name | Transcript_BioType | Gene_Coding | Transcript_ID | Exon_Rank  | Genotype [ | ERRORS | WARNINGS ] )' ">
##SnpEffCmd="SnpEff  -lof GRCh37.75 pSS_refseq.vcf "
##INFO=<ID=ANN,Number=.,Type=String,Description="Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO' ">
.
.
.

I tried for example to extract all "intergenic_region" effects by this:

SnpSift.jar filter "(EFF[*].EFFECT = 'intergenic_region')" testsample.vcf > genes2.vcf

However, this does not work if I only want to queries against the EFF-field, NOT ANN-field. I have observed that SnpSift instead try to search in the ANN-field instead of EFF-field, then no hits. In ANN-field I have "downstream_gene_variant" instead of "intergenic_region". Issuing the same command like this:

SnpSift.jar filter "(EFF[*].EFFECT = 'downstream_gene_variant')" testsample.vcf > genes2.vcf

Then it works, which obviously means that SnpSift is looking in the ANN-field.

Any ideas?

snpsift • 2.7k views
ADD COMMENTlink modified 3.6 years ago by Pierre Lindenbaum131k • written 3.6 years ago by eXpander100
0
gravatar for Pierre Lindenbaum
3.6 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum131k wrote:

your VCF was processed with an old SNPEFF version :

http://snpeff.sourceforge.net/

the new SNPEFF is searching for the ANN tag.

Important: This version implements the new VCF annotation standard 'ANN' field. Latest version 4.3i (2016-12-15)

try to use an older version of snpSift or annotate your VCF with the new version of snpEff.

or (dirty solution), just use grep

grep -E '(^#|downstream_gene_variant)' your.vcf
ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by Pierre Lindenbaum131k

Thank you. But is SnpSift limited to pre-defined VCF-fields? What if I change EFF to EFF_Custom, would it be possible to do a search in that field instead, or more general, how does SnpSift search in custom-made fields?

ADD REPLYlink written 3.6 years ago by eXpander100

in the new version of SNPEFF, the ANN field is following a specification: http://snpeff.sourceforge.net/VCFannotationformat_v1.0.pdf, the old EFF fields doesn't look the same at all; so you cannot use one tool with the other format.

ADD REPLYlink written 3.6 years ago by Pierre Lindenbaum131k
1

Well, this sucks, period.

I'm going back to Python...

ADD REPLYlink written 3.6 years ago by eXpander100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1972 users visited in the last hour