Question: SnpSift doesn't show any VariantType for filtering
0
gravatar for Vasu
2.3 years ago by
Vasu420
Vasu420 wrote:

Hello,

I'm using a vcf file for some filtering using SnpSift. I would like to get mutation counts that alter TFBS. [Check this paper - https://www.frontiersin.org/articles/10.3389/fgene.2012.00100/full#h7] Check the Table 1 (https://www.frontiersin.org/files/Articles/18778/fgene-03-00100-HTML/image_m/fgene-03-00100-t001.jpg)

I would like to get something like this.

I used multiple commands and added annotation and the vcf file looks like following. It has "TF_binding_site_variant" and Vartype showing SNP/DEL/IND/MNP.

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
1       100225517       MU3692753       A       G       .       .       CONSEQUENCE=FRRS1|ENSG00000156869|1|FRRS1-001|ENST00000287474||intron_variant||,FRRS1|ENSG00000156869|1|FRRS1-004|ENST00000370176||intron_variant||,FRRS1|ENSG00000156869|1|FRRS1-201|ENST00000414213||intron_variant||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=A>G;project_count=1;studies=PCAWG;tested_donors=12198;ANN=G|TF_binding_site_variant|LOW|||FOXA2|MA0047.2|||n.100225517T>C||||||,G|TF_binding_site_variant|LOW|||FOXA1|MA0148.1|||n.100225517T>C||||||,G|intron_variant|MODIFIER|FRRS1|ENSG00000156869|transcript|ENST00000287474|protein_coding|1/16|c.-106+5336T>C||||||,G|intron_variant|MODIFIER|FRRS1|ENSG00000156869|transcript|ENST00000414213|protein_coding|1/16|c.-106+5336T>C||||||,G|intron_variant|MODIFIER|FRRS1|ENSG00000156869|transcript|ENST00000370176|retained_intron|1/2|n.25+6646T>C||||||;SNP;HOM;VARTYPE=SNP
1       100274466       MU2855033       T       C       .       .       CONSEQUENCE=||||||intergenic_region||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=T>C;project_count=1;studies=PCAWG;tested_donors=12198;ANN=C|TF_binding_site_variant|LOW|||Srf|MA0083.1|||n.100274466A>G||||||,C|intergenic_region|MODIFIER|Y_RNA-AL451051.1|ENSG00000202254-ENSG00000252226|intergenic_region|ENSG00000202254-ENSG00000252226|||n.100274466T>C||||||;SNP;HOM;VARTYPE=SNP
1       101774964       MU78905029      T       G       .       .       CONSEQUENCE=||||||intergenic_region||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=T>G;project_count=1;studies=PCAWG;tested_donors=12198;ANN=G|TF_binding_site_variant|MODIFIER|||CTCF|MA0139.1|||n.101774964T>G||||||,G|intergenic_region|MODIFIER|PPIAP7-RP11-157N3.1|ENSG00000173810-ENSG00000231671|intergenic_region|ENSG00000173810-ENSG00000231671|||n.101774964T>G||||||;SNP;HOM;VARTYPE=SNP
1       101774966       MU3316414       A       C       .       .       CONSEQUENCE=||||||intergenic_region||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=A>C;project_count=1;studies=PCAWG;tested_donors=12198;ANN=C|TF_binding_site_variant|MODIFIER|||CTCF|MA0139.1|||n.101774966A>C||||||,C|intergenic_region|MODIFIER|PPIAP7-RP11-157N3.1|ENSG00000173810-ENSG00000231671|intergenic_region|ENSG00000173810-ENSG00000231671|||n.101774966A>C||||||;SNP;HOM;VARTYPE=SNP

I checked few filtering steps in the documentation, but couldn't find anything that shows number of each mutations that affect TFBS.

I tried something like this but didn't work: [just to check - how many number of variant_type Deletion alters transcription factor binding sites.

cat input.vcf | java -jar SnpSift.jar filter "((exists DEL) & (ANN[*].EFFECT)" > eg.vcf

Needed help in this. Thank you !!

ADD COMMENTlink modified 2.3 years ago by Biostar ♦♦ 20 • written 2.3 years ago by Vasu420

may be I'm wrong but I don't think snpEff/snpsift is able to annotate a vcf at this level of precision (eg.: a "TFB context"). Those tools are "just" able do some basic annotation, e.g: the terms under: http://www.sequenceontology.org/browser/release_2.5/term/SO:0001564

ADD REPLYlink written 2.3 years ago by Pierre Lindenbaum127k

But you can see in the above few lines from vcf - ANN=C|TF_binding_site_variant|LOW|||Srf|MA0083.1|||n.100274466A>G||||||,C|intergenic_region|MODIFIER|Y_RNA-AL451051.1|ENSG00000202254-ENSG00000252226|intergenic_region|ENSG00000202254-ENSG00000252226|||n.100274466T>C||||||;SNP;HOM;VARTYPE=SNP

Which means [TF_binding_site_variant|LOW|||Srf|MA0083.1] corresponding to motif MA0083.1, which you can look up in Jaspar database.

So, I would like to count the number of each type of mutations altering TFBS or motif

You can check this in SnpEff documentation - Additional Annotations - Go to Motif [Subheading] (http://snpeff.sourceforge.net/SnpEff_manual.html#run)

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by Vasu420

But you can see in the above few lines from vcf -

ok so I'm wrong :-)

ADD REPLYlink written 2.3 years ago by Pierre Lindenbaum127k

I tried something like this but didn't work

This is never a good description. What do you expected? What is the result you get instead?

Please post a full vcf example inlucding the header.

fin swimmer

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by finswimmer13k

@OP: All the example vcf records, you furnished above are SNVs and I am not sure if any one of SNVs lead to deletion to something. You should be looking at INDELs in your vcf. Example filtering that worked for example annotaiton using snpsift:

output:

 $ java -jar /opt/snpEff/SnpSift.jar filter "ANN[*].EFFECT has 'intron_variant'" snpeff_result.vcf 

##SnpSiftVersion="SnpSift 4.3t (build 2017-11-24 10:18), by Pablo Cingolani"
##SnpSiftCmd="SnpSift Filter 'ANN[*].EFFECT has 'intron_variant'' snpeff_result.vcf"
##FILTER=<ID=SnpSift,Description="SnpSift 4.3t (build 2017-11-24 10:18), by Pablo Cingolani, Expression used: ANN[*].EFFECT has 'intron_variant'">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
1   100225517   MU3692753   A   G   .   .   CONSEQUENCE=FRRS1|ENSG00000156869|1|FRRS1-001|ENST00000287474||intron_variant||,FRRS1|ENSG00000156869|1|FRRS1-004|ENST00000370176||intron_variant||,FRRS1|ENSG00000156869|1|FRRS1-201|ENST00000414213||intron_variant||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=A>G;project_count=1;studies=PCAWG;tested_donors=12198;ANN=G|TF_binding_site_variant|LOW|||FOXA2|MA0047.2|||n.100225517T>C||||||,G|TF_binding_site_variant|LOW|||FOXA1|MA0148.1|||n.100225517T>C||||||,G|intron_variant|MODIFIER|FRRS1|ENSG00000156869|transcript|ENST00000287474|protein_coding|1/16|c.-106+5336T>C||||||,G|intron_variant|MODIFIER|FRRS1|ENSG00000156869|transcript|ENST00000414213|protein_coding|1/16|c.-106+5336T>C||||||,G|intron_variant|MODIFIER|FRRS1|ENSG00000156869|transcript|ENST00000370176|retained_intron|1/2|n.25+6646T>C||||||;SNP;HOM;VARTYPE=SNP

input:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
1   100225517   MU3692753   A   G   .   .   CONSEQUENCE=FRRS1|ENSG00000156869|1|FRRS1-001|ENST00000287474||intron_variant||,FRRS1|ENSG00000156869|1|FRRS1-004|ENST00000370176||intron_variant||,FRRS1|ENSG00000156869|1|FRRS1-201|ENST00000414213||intron_variant||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=A>G;project_count=1;studies=PCAWG;tested_donors=12198;ANN=G|TF_binding_site_variant|LOW|||FOXA2|MA0047.2|||n.100225517T>C||||||,G|TF_binding_site_variant|LOW|||FOXA1|MA0148.1|||n.100225517T>C||||||,G|intron_variant|MODIFIER|FRRS1|ENSG00000156869|transcript|ENST00000287474|protein_coding|1/16|c.-106+5336T>C||||||,G|intron_variant|MODIFIER|FRRS1|ENSG00000156869|transcript|ENST00000414213|protein_coding|1/16|c.-106+5336T>C||||||,G|intron_variant|MODIFIER|FRRS1|ENSG00000156869|transcript|ENST00000370176|retained_intron|1/2|n.25+6646T>C||||||;SNP;HOM;VARTYPE=SNP
1   100274466   MU2855033   T   C   .   .   CONSEQUENCE=||||||intergenic_region||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=T>C;project_count=1;studies=PCAWG;tested_donors=12198;ANN=C|TF_binding_site_variant|LOW|||Srf|MA0083.1|||n.100274466A>G||||||,C|intergenic_region|MODIFIER|Y_RNA-AL451051.1|ENSG00000202254-ENSG00000252226|intergenic_region|ENSG00000202254-ENSG00000252226|||n.100274466T>C||||||;SNP;HOM;VARTYPE=SNP
1   101774964   MU78905029  T   G   .   .   CONSEQUENCE=||||||intergenic_region||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=T>G;project_count=1;studies=PCAWG;tested_donors=12198;ANN=G|TF_binding_site_variant|MODIFIER|||CTCF|MA0139.1|||n.101774964T>G||||||,G|intergenic_region|MODIFIER|PPIAP7-RP11-157N3.1|ENSG00000173810-ENSG00000231671|intergenic_region|ENSG00000173810-ENSG00000231671|||n.101774964T>G||||||;SNP;HOM;VARTYPE=SNP
1   101774966   MU3316414   A   C   .   .   CONSEQUENCE=||||||intergenic_region||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=A>C;project_count=1;studies=PCAWG;tested_donors=12198;ANN=C|TF_binding_site_variant|MODIFIER|||CTCF|MA0139.1|||n.101774966A>C||||||,C|intergenic_region|MODIFIER|PPIAP7-RP11-157N3.1|ENSG00000173810-ENSG00000231671|intergenic_region|ENSG00000173810-ENSG00000231671|||n.101774966A>C||||||;SNP;HOM;VARTYPE=SNP
ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by cpad011212k

Yes, I do see that in the SnpEff documentation. But I want to find which mutations alter TFBS/motif

ADD REPLYlink written 2.3 years ago by Vasu420

Since you are looking for numbers (not records, If I understand correct), just do a grep and count (on OP records, it should give 2):

$ grep -wc "TF_binding_site_variant" input.vcf

If you are looking for records, use following filter on OP vcf (two records will be listed):

 $  java -jar /opt/snpEff/SnpSift.jar filter "ANN[*].EFFECT has 'TF_binding_site_variant'" snpeff.vcf

output using OP records:

##SnpSiftVersion="SnpSift 4.3t (build 2017-11-24 10:18), by Pablo Cingolani"
##SnpSiftCmd="SnpSift Filter 'ANN[*].EFFECT has 'TF_binding_site_variant'' snpeff.vcf"
##FILTER=<ID=SnpSift,Description="SnpSift 4.3t (build 2017-11-24 10:18), by Pablo Cingolani, Expression used: ANN[*].EFFECT has 'TF_binding_site_variant'">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
1   100225517   MU3692753   A   G   .   .   CONSEQUENCE=FRRS1|ENSG00000156869|1|FRRS1-001|ENST00000287474||intron_variant||,FRRS1|ENSG00000156869|1|FRRS1-004|ENST00000370176||intron_variant||,FRRS1|ENSG00000156869|1|FRRS1-201|ENST00000414213||intron_variant||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=A>G;project_count=1;studies=PCAWG;tested_donors=12198;ANN=G|TF_binding_site_variant|LOW|||FOXA2|MA0047.2|||n.100225517T>C||||||,G|TF_binding_site_variant|LOW|||FOXA1|MA0148.1|||n.100225517T>C||||||,G|intron_variant|MODIFIER|FRRS1|ENSG00000156869|transcript|ENST00000287474|protein_coding|1/16|c.-106+5336T>C||||||,G|intron_variant|MODIFIER|FRRS1|ENSG00000156869|transcript|ENST00000414213|protein_coding|1/16|c.-106+5336T>C||||||,G|intron_variant|MODIFIER|FRRS1|ENSG00000156869|transcript|ENST00000370176|retained_intron|1/2|n.25+6646T>C||||||;SNP;HOM;VARTYPE=SNP
1   100274466   MU2855033   T   C   .   .   CONSEQUENCE=||||||intergenic_region||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=T>C;project_count=1;studies=PCAWG;tested_donors=12198;ANN=C|TF_binding_site_variant|LOW|||Srf|MA0083.1|||n.100274466A>G||||||,C|intergenic_region|MODIFIER|Y_RNA-AL451051.1|ENSG00000202254-ENSG00000252226|intergenic_region|ENSG00000202254-ENSG00000252226|||n.100274466T>C||||||;SNP;HOM;VARTYPE=SNP

if you would like to fitler any variant with TF_binding effect use:

$  java -jar /opt/snpEff/SnpSift.jar filter "ANN[*].EFFECT =~ 'TF_binding'" snpeff.vcf
ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by cpad011212k

No this is not the one I'm telling. You can see there is also see in the input showing VARTYPE = SNP/IND/DEL/MNP. What I want is to count the number of varainttypes altering TFBS/motif. It should give something like this [See the first two columns - https://www.frontiersin.org/files/Articles/18778/fgene-03-00100-HTML/image_m/fgene-03-00100-t001.jpg]

ADD REPLYlink written 2.3 years ago by Vasu420

If you are looking for summary, then you look into summary.html from snpeff

ADD REPLYlink written 2.3 years ago by cpad011212k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1916 users visited in the last hour