SnpSift doesn't show any VariantType for filtering
0
0
Entering edit mode
3.8 years ago
Vasu ▴ 600

Hello,

I'm using a vcf file for some filtering using SnpSift. I would like to get mutation counts that alter TFBS. [Check this paper - https://www.frontiersin.org/articles/10.3389/fgene.2012.00100/full#h7] Check the Table 1 (https://www.frontiersin.org/files/Articles/18778/fgene-03-00100-HTML/image_m/fgene-03-00100-t001.jpg)

I would like to get something like this.

I used multiple commands and added annotation and the vcf file looks like following. It has "TF_binding_site_variant" and Vartype showing SNP/DEL/IND/MNP.

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
1       100225517       MU3692753       A       G       .       .       CONSEQUENCE=FRRS1|ENSG00000156869|1|FRRS1-001|ENST00000287474||intron_variant||,FRRS1|ENSG00000156869|1|FRRS1-004|ENST00000370176||intron_variant||,FRRS1|ENSG00000156869|1|FRRS1-201|ENST00000414213||intron_variant||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=A>G;project_count=1;studies=PCAWG;tested_donors=12198;ANN=G|TF_binding_site_variant|LOW|||FOXA2|MA0047.2|||n.100225517T>C||||||,G|TF_binding_site_variant|LOW|||FOXA1|MA0148.1|||n.100225517T>C||||||,G|intron_variant|MODIFIER|FRRS1|ENSG00000156869|transcript|ENST00000287474|protein_coding|1/16|c.-106+5336T>C||||||,G|intron_variant|MODIFIER|FRRS1|ENSG00000156869|transcript|ENST00000414213|protein_coding|1/16|c.-106+5336T>C||||||,G|intron_variant|MODIFIER|FRRS1|ENSG00000156869|transcript|ENST00000370176|retained_intron|1/2|n.25+6646T>C||||||;SNP;HOM;VARTYPE=SNP
1       100274466       MU2855033       T       C       .       .       CONSEQUENCE=||||||intergenic_region||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=T>C;project_count=1;studies=PCAWG;tested_donors=12198;ANN=C|TF_binding_site_variant|LOW|||Srf|MA0083.1|||n.100274466A>G||||||,C|intergenic_region|MODIFIER|Y_RNA-AL451051.1|ENSG00000202254-ENSG00000252226|intergenic_region|ENSG00000202254-ENSG00000252226|||n.100274466T>C||||||;SNP;HOM;VARTYPE=SNP
1       101774964       MU78905029      T       G       .       .       CONSEQUENCE=||||||intergenic_region||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=T>G;project_count=1;studies=PCAWG;tested_donors=12198;ANN=G|TF_binding_site_variant|MODIFIER|||CTCF|MA0139.1|||n.101774964T>G||||||,G|intergenic_region|MODIFIER|PPIAP7-RP11-157N3.1|ENSG00000173810-ENSG00000231671|intergenic_region|ENSG00000173810-ENSG00000231671|||n.101774964T>G||||||;SNP;HOM;VARTYPE=SNP
1       101774966       MU3316414       A       C       .       .       CONSEQUENCE=||||||intergenic_region||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=A>C;project_count=1;studies=PCAWG;tested_donors=12198;ANN=C|TF_binding_site_variant|MODIFIER|||CTCF|MA0139.1|||n.101774966A>C||||||,C|intergenic_region|MODIFIER|PPIAP7-RP11-157N3.1|ENSG00000173810-ENSG00000231671|intergenic_region|ENSG00000173810-ENSG00000231671|||n.101774966A>C||||||;SNP;HOM;VARTYPE=SNP


I checked few filtering steps in the documentation, but couldn't find anything that shows number of each mutations that affect TFBS.

I tried something like this but didn't work: [just to check - how many number of variant_type Deletion alters transcription factor binding sites.

cat input.vcf | java -jar SnpSift.jar filter "((exists DEL) & (ANN[*].EFFECT)" > eg.vcf


Needed help in this. Thank you !!

snpeff snpsift filtering mutations • 1.8k views
0
Entering edit mode

may be I'm wrong but I don't think snpEff/snpsift is able to annotate a vcf at this level of precision (eg.: a "TFB context"). Those tools are "just" able do some basic annotation, e.g: the terms under: http://www.sequenceontology.org/browser/release_2.5/term/SO:0001564

0
Entering edit mode

But you can see in the above few lines from vcf - ANN=C|TF_binding_site_variant|LOW|||Srf|MA0083.1|||n.100274466A>G||||||,C|intergenic_region|MODIFIER|Y_RNA-AL451051.1|ENSG00000202254-ENSG00000252226|intergenic_region|ENSG00000202254-ENSG00000252226|||n.100274466T>C||||||;SNP;HOM;VARTYPE=SNP

Which means [TF_binding_site_variant|LOW|||Srf|MA0083.1] corresponding to motif MA0083.1, which you can look up in Jaspar database.

So, I would like to count the number of each type of mutations altering TFBS or motif

You can check this in SnpEff documentation - Additional Annotations - Go to Motif [Subheading] (http://snpeff.sourceforge.net/SnpEff_manual.html#run)

0
Entering edit mode

But you can see in the above few lines from vcf -

ok so I'm wrong :-)

0
Entering edit mode

I tried something like this but didn't work

This is never a good description. What do you expected? What is the result you get instead?

fin swimmer

0
Entering edit mode

@OP: All the example vcf records, you furnished above are SNVs and I am not sure if any one of SNVs lead to deletion to something. You should be looking at INDELs in your vcf. Example filtering that worked for example annotaiton using snpsift:

output:

 $java -jar /opt/snpEff/SnpSift.jar filter "ANN[*].EFFECT has 'intron_variant'" snpeff_result.vcf ##SnpSiftVersion="SnpSift 4.3t (build 2017-11-24 10:18), by Pablo Cingolani" ##SnpSiftCmd="SnpSift Filter 'ANN[*].EFFECT has 'intron_variant'' snpeff_result.vcf" ##FILTER=<ID=SnpSift,Description="SnpSift 4.3t (build 2017-11-24 10:18), by Pablo Cingolani, Expression used: ANN[*].EFFECT has 'intron_variant'"> #CHROM POS ID REF ALT QUAL FILTER INFO 1 100225517 MU3692753 A G . . CONSEQUENCE=FRRS1|ENSG00000156869|1|FRRS1-001|ENST00000287474||intron_variant||,FRRS1|ENSG00000156869|1|FRRS1-004|ENST00000370176||intron_variant||,FRRS1|ENSG00000156869|1|FRRS1-201|ENST00000414213||intron_variant||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=A>G;project_count=1;studies=PCAWG;tested_donors=12198;ANN=G|TF_binding_site_variant|LOW|||FOXA2|MA0047.2|||n.100225517T>C||||||,G|TF_binding_site_variant|LOW|||FOXA1|MA0148.1|||n.100225517T>C||||||,G|intron_variant|MODIFIER|FRRS1|ENSG00000156869|transcript|ENST00000287474|protein_coding|1/16|c.-106+5336T>C||||||,G|intron_variant|MODIFIER|FRRS1|ENSG00000156869|transcript|ENST00000414213|protein_coding|1/16|c.-106+5336T>C||||||,G|intron_variant|MODIFIER|FRRS1|ENSG00000156869|transcript|ENST00000370176|retained_intron|1/2|n.25+6646T>C||||||;SNP;HOM;VARTYPE=SNP  input: #CHROM POS ID REF ALT QUAL FILTER INFO 1 100225517 MU3692753 A G . . CONSEQUENCE=FRRS1|ENSG00000156869|1|FRRS1-001|ENST00000287474||intron_variant||,FRRS1|ENSG00000156869|1|FRRS1-004|ENST00000370176||intron_variant||,FRRS1|ENSG00000156869|1|FRRS1-201|ENST00000414213||intron_variant||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=A>G;project_count=1;studies=PCAWG;tested_donors=12198;ANN=G|TF_binding_site_variant|LOW|||FOXA2|MA0047.2|||n.100225517T>C||||||,G|TF_binding_site_variant|LOW|||FOXA1|MA0148.1|||n.100225517T>C||||||,G|intron_variant|MODIFIER|FRRS1|ENSG00000156869|transcript|ENST00000287474|protein_coding|1/16|c.-106+5336T>C||||||,G|intron_variant|MODIFIER|FRRS1|ENSG00000156869|transcript|ENST00000414213|protein_coding|1/16|c.-106+5336T>C||||||,G|intron_variant|MODIFIER|FRRS1|ENSG00000156869|transcript|ENST00000370176|retained_intron|1/2|n.25+6646T>C||||||;SNP;HOM;VARTYPE=SNP 1 100274466 MU2855033 T C . . CONSEQUENCE=||||||intergenic_region||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=T>C;project_count=1;studies=PCAWG;tested_donors=12198;ANN=C|TF_binding_site_variant|LOW|||Srf|MA0083.1|||n.100274466A>G||||||,C|intergenic_region|MODIFIER|Y_RNA-AL451051.1|ENSG00000202254-ENSG00000252226|intergenic_region|ENSG00000202254-ENSG00000252226|||n.100274466T>C||||||;SNP;HOM;VARTYPE=SNP 1 101774964 MU78905029 T G . . CONSEQUENCE=||||||intergenic_region||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=T>G;project_count=1;studies=PCAWG;tested_donors=12198;ANN=G|TF_binding_site_variant|MODIFIER|||CTCF|MA0139.1|||n.101774964T>G||||||,G|intergenic_region|MODIFIER|PPIAP7-RP11-157N3.1|ENSG00000173810-ENSG00000231671|intergenic_region|ENSG00000173810-ENSG00000231671|||n.101774964T>G||||||;SNP;HOM;VARTYPE=SNP 1 101774966 MU3316414 A C . . CONSEQUENCE=||||||intergenic_region||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=A>C;project_count=1;studies=PCAWG;tested_donors=12198;ANN=C|TF_binding_site_variant|MODIFIER|||CTCF|MA0139.1|||n.101774966A>C||||||,C|intergenic_region|MODIFIER|PPIAP7-RP11-157N3.1|ENSG00000173810-ENSG00000231671|intergenic_region|ENSG00000173810-ENSG00000231671|||n.101774966A>C||||||;SNP;HOM;VARTYPE=SNP  ADD REPLY 0 Entering edit mode Yes, I do see that in the SnpEff documentation. But I want to find which mutations alter TFBS/motif ADD REPLY 0 Entering edit mode Since you are looking for numbers (not records, If I understand correct), just do a grep and count (on OP records, it should give 2): $ grep -wc "TF_binding_site_variant" input.vcf


If you are looking for records, use following filter on OP vcf (two records will be listed):

 $java -jar /opt/snpEff/SnpSift.jar filter "ANN[*].EFFECT has 'TF_binding_site_variant'" snpeff.vcf  output using OP records: ##SnpSiftVersion="SnpSift 4.3t (build 2017-11-24 10:18), by Pablo Cingolani" ##SnpSiftCmd="SnpSift Filter 'ANN[*].EFFECT has 'TF_binding_site_variant'' snpeff.vcf" ##FILTER=<ID=SnpSift,Description="SnpSift 4.3t (build 2017-11-24 10:18), by Pablo Cingolani, Expression used: ANN[*].EFFECT has 'TF_binding_site_variant'"> #CHROM POS ID REF ALT QUAL FILTER INFO 1 100225517 MU3692753 A G . . CONSEQUENCE=FRRS1|ENSG00000156869|1|FRRS1-001|ENST00000287474||intron_variant||,FRRS1|ENSG00000156869|1|FRRS1-004|ENST00000370176||intron_variant||,FRRS1|ENSG00000156869|1|FRRS1-201|ENST00000414213||intron_variant||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=A>G;project_count=1;studies=PCAWG;tested_donors=12198;ANN=G|TF_binding_site_variant|LOW|||FOXA2|MA0047.2|||n.100225517T>C||||||,G|TF_binding_site_variant|LOW|||FOXA1|MA0148.1|||n.100225517T>C||||||,G|intron_variant|MODIFIER|FRRS1|ENSG00000156869|transcript|ENST00000287474|protein_coding|1/16|c.-106+5336T>C||||||,G|intron_variant|MODIFIER|FRRS1|ENSG00000156869|transcript|ENST00000414213|protein_coding|1/16|c.-106+5336T>C||||||,G|intron_variant|MODIFIER|FRRS1|ENSG00000156869|transcript|ENST00000370176|retained_intron|1/2|n.25+6646T>C||||||;SNP;HOM;VARTYPE=SNP 1 100274466 MU2855033 T C . . CONSEQUENCE=||||||intergenic_region||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=T>C;project_count=1;studies=PCAWG;tested_donors=12198;ANN=C|TF_binding_site_variant|LOW|||Srf|MA0083.1|||n.100274466A>G||||||,C|intergenic_region|MODIFIER|Y_RNA-AL451051.1|ENSG00000202254-ENSG00000252226|intergenic_region|ENSG00000202254-ENSG00000252226|||n.100274466T>C||||||;SNP;HOM;VARTYPE=SNP  if you would like to fitler any variant with TF_binding effect use: $  java -jar /opt/snpEff/SnpSift.jar filter "ANN[*].EFFECT =~ 'TF_binding'" snpeff.vcf

0
Entering edit mode

No this is not the one I'm telling. You can see there is also see in the input showing VARTYPE = SNP/IND/DEL/MNP. What I want is to count the number of varainttypes altering TFBS/motif. It should give something like this [See the first two columns - https://www.frontiersin.org/files/Articles/18778/fgene-03-00100-HTML/image_m/fgene-03-00100-t001.jpg]

0
Entering edit mode

If you are looking for summary, then you look into summary.html from snpeff