SnpSift doesn't show any VariantType for filtering
0
0
Entering edit mode
6.4 years ago
Vasu ▴ 770

Hello,

I'm using a vcf file for some filtering using SnpSift. I would like to get mutation counts that alter TFBS. [Check this paper - https://www.frontiersin.org/articles/10.3389/fgene.2012.00100/full#h7] Check the Table 1 (https://www.frontiersin.org/files/Articles/18778/fgene-03-00100-HTML/image_m/fgene-03-00100-t001.jpg)

I would like to get something like this.

I used multiple commands and added annotation and the vcf file looks like following. It has "TF_binding_site_variant" and Vartype showing SNP/DEL/IND/MNP.

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
1       100225517       MU3692753       A       G       .       .       CONSEQUENCE=FRRS1|ENSG00000156869|1|FRRS1-001|ENST00000287474||intron_variant||,FRRS1|ENSG00000156869|1|FRRS1-004|ENST00000370176||intron_variant||,FRRS1|ENSG00000156869|1|FRRS1-201|ENST00000414213||intron_variant||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=A>G;project_count=1;studies=PCAWG;tested_donors=12198;ANN=G|TF_binding_site_variant|LOW|||FOXA2|MA0047.2|||n.100225517T>C||||||,G|TF_binding_site_variant|LOW|||FOXA1|MA0148.1|||n.100225517T>C||||||,G|intron_variant|MODIFIER|FRRS1|ENSG00000156869|transcript|ENST00000287474|protein_coding|1/16|c.-106+5336T>C||||||,G|intron_variant|MODIFIER|FRRS1|ENSG00000156869|transcript|ENST00000414213|protein_coding|1/16|c.-106+5336T>C||||||,G|intron_variant|MODIFIER|FRRS1|ENSG00000156869|transcript|ENST00000370176|retained_intron|1/2|n.25+6646T>C||||||;SNP;HOM;VARTYPE=SNP
1       100274466       MU2855033       T       C       .       .       CONSEQUENCE=||||||intergenic_region||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=T>C;project_count=1;studies=PCAWG;tested_donors=12198;ANN=C|TF_binding_site_variant|LOW|||Srf|MA0083.1|||n.100274466A>G||||||,C|intergenic_region|MODIFIER|Y_RNA-AL451051.1|ENSG00000202254-ENSG00000252226|intergenic_region|ENSG00000202254-ENSG00000252226|||n.100274466T>C||||||;SNP;HOM;VARTYPE=SNP
1       101774964       MU78905029      T       G       .       .       CONSEQUENCE=||||||intergenic_region||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=T>G;project_count=1;studies=PCAWG;tested_donors=12198;ANN=G|TF_binding_site_variant|MODIFIER|||CTCF|MA0139.1|||n.101774964T>G||||||,G|intergenic_region|MODIFIER|PPIAP7-RP11-157N3.1|ENSG00000173810-ENSG00000231671|intergenic_region|ENSG00000173810-ENSG00000231671|||n.101774964T>G||||||;SNP;HOM;VARTYPE=SNP
1       101774966       MU3316414       A       C       .       .       CONSEQUENCE=||||||intergenic_region||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=A>C;project_count=1;studies=PCAWG;tested_donors=12198;ANN=C|TF_binding_site_variant|MODIFIER|||CTCF|MA0139.1|||n.101774966A>C||||||,C|intergenic_region|MODIFIER|PPIAP7-RP11-157N3.1|ENSG00000173810-ENSG00000231671|intergenic_region|ENSG00000173810-ENSG00000231671|||n.101774966A>C||||||;SNP;HOM;VARTYPE=SNP

I checked few filtering steps in the documentation, but couldn't find anything that shows number of each mutations that affect TFBS.

I tried something like this but didn't work: [just to check - how many number of variant_type Deletion alters transcription factor binding sites.

cat input.vcf | java -jar SnpSift.jar filter "((exists DEL) & (ANN[*].EFFECT)" > eg.vcf

Needed help in this. Thank you !!

snpeff snpsift filtering mutations • 3.0k views
ADD COMMENT
0
Entering edit mode

may be I'm wrong but I don't think snpEff/snpsift is able to annotate a vcf at this level of precision (eg.: a "TFB context"). Those tools are "just" able do some basic annotation, e.g: the terms under: http://www.sequenceontology.org/browser/release_2.5/term/SO:0001564

ADD REPLY
0
Entering edit mode

But you can see in the above few lines from vcf - ANN=C|TF_binding_site_variant|LOW|||Srf|MA0083.1|||n.100274466A>G||||||,C|intergenic_region|MODIFIER|Y_RNA-AL451051.1|ENSG00000202254-ENSG00000252226|intergenic_region|ENSG00000202254-ENSG00000252226|||n.100274466T>C||||||;SNP;HOM;VARTYPE=SNP

Which means [TF_binding_site_variant|LOW|||Srf|MA0083.1] corresponding to motif MA0083.1, which you can look up in Jaspar database.

So, I would like to count the number of each type of mutations altering TFBS or motif

You can check this in SnpEff documentation - Additional Annotations - Go to Motif [Subheading] (http://snpeff.sourceforge.net/SnpEff_manual.html#run)

ADD REPLY
0
Entering edit mode

But you can see in the above few lines from vcf -

ok so I'm wrong :-)

ADD REPLY
0
Entering edit mode

I tried something like this but didn't work

This is never a good description. What do you expected? What is the result you get instead?

Please post a full vcf example inlucding the header.

fin swimmer

ADD REPLY
0
Entering edit mode

@OP: All the example vcf records, you furnished above are SNVs and I am not sure if any one of SNVs lead to deletion to something. You should be looking at INDELs in your vcf. Example filtering that worked for example annotaiton using snpsift:

output:

 $ java -jar /opt/snpEff/SnpSift.jar filter "ANN[*].EFFECT has 'intron_variant'" snpeff_result.vcf 

##SnpSiftVersion="SnpSift 4.3t (build 2017-11-24 10:18), by Pablo Cingolani"
##SnpSiftCmd="SnpSift Filter 'ANN[*].EFFECT has 'intron_variant'' snpeff_result.vcf"
##FILTER=<ID=SnpSift,Description="SnpSift 4.3t (build 2017-11-24 10:18), by Pablo Cingolani, Expression used: ANN[*].EFFECT has 'intron_variant'">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
1   100225517   MU3692753   A   G   .   .   CONSEQUENCE=FRRS1|ENSG00000156869|1|FRRS1-001|ENST00000287474||intron_variant||,FRRS1|ENSG00000156869|1|FRRS1-004|ENST00000370176||intron_variant||,FRRS1|ENSG00000156869|1|FRRS1-201|ENST00000414213||intron_variant||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=A>G;project_count=1;studies=PCAWG;tested_donors=12198;ANN=G|TF_binding_site_variant|LOW|||FOXA2|MA0047.2|||n.100225517T>C||||||,G|TF_binding_site_variant|LOW|||FOXA1|MA0148.1|||n.100225517T>C||||||,G|intron_variant|MODIFIER|FRRS1|ENSG00000156869|transcript|ENST00000287474|protein_coding|1/16|c.-106+5336T>C||||||,G|intron_variant|MODIFIER|FRRS1|ENSG00000156869|transcript|ENST00000414213|protein_coding|1/16|c.-106+5336T>C||||||,G|intron_variant|MODIFIER|FRRS1|ENSG00000156869|transcript|ENST00000370176|retained_intron|1/2|n.25+6646T>C||||||;SNP;HOM;VARTYPE=SNP

input:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
1   100225517   MU3692753   A   G   .   .   CONSEQUENCE=FRRS1|ENSG00000156869|1|FRRS1-001|ENST00000287474||intron_variant||,FRRS1|ENSG00000156869|1|FRRS1-004|ENST00000370176||intron_variant||,FRRS1|ENSG00000156869|1|FRRS1-201|ENST00000414213||intron_variant||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=A>G;project_count=1;studies=PCAWG;tested_donors=12198;ANN=G|TF_binding_site_variant|LOW|||FOXA2|MA0047.2|||n.100225517T>C||||||,G|TF_binding_site_variant|LOW|||FOXA1|MA0148.1|||n.100225517T>C||||||,G|intron_variant|MODIFIER|FRRS1|ENSG00000156869|transcript|ENST00000287474|protein_coding|1/16|c.-106+5336T>C||||||,G|intron_variant|MODIFIER|FRRS1|ENSG00000156869|transcript|ENST00000414213|protein_coding|1/16|c.-106+5336T>C||||||,G|intron_variant|MODIFIER|FRRS1|ENSG00000156869|transcript|ENST00000370176|retained_intron|1/2|n.25+6646T>C||||||;SNP;HOM;VARTYPE=SNP
1   100274466   MU2855033   T   C   .   .   CONSEQUENCE=||||||intergenic_region||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=T>C;project_count=1;studies=PCAWG;tested_donors=12198;ANN=C|TF_binding_site_variant|LOW|||Srf|MA0083.1|||n.100274466A>G||||||,C|intergenic_region|MODIFIER|Y_RNA-AL451051.1|ENSG00000202254-ENSG00000252226|intergenic_region|ENSG00000202254-ENSG00000252226|||n.100274466T>C||||||;SNP;HOM;VARTYPE=SNP
1   101774964   MU78905029  T   G   .   .   CONSEQUENCE=||||||intergenic_region||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=T>G;project_count=1;studies=PCAWG;tested_donors=12198;ANN=G|TF_binding_site_variant|MODIFIER|||CTCF|MA0139.1|||n.101774964T>G||||||,G|intergenic_region|MODIFIER|PPIAP7-RP11-157N3.1|ENSG00000173810-ENSG00000231671|intergenic_region|ENSG00000173810-ENSG00000231671|||n.101774964T>G||||||;SNP;HOM;VARTYPE=SNP
1   101774966   MU3316414   A   C   .   .   CONSEQUENCE=||||||intergenic_region||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=A>C;project_count=1;studies=PCAWG;tested_donors=12198;ANN=C|TF_binding_site_variant|MODIFIER|||CTCF|MA0139.1|||n.101774966A>C||||||,C|intergenic_region|MODIFIER|PPIAP7-RP11-157N3.1|ENSG00000173810-ENSG00000231671|intergenic_region|ENSG00000173810-ENSG00000231671|||n.101774966A>C||||||;SNP;HOM;VARTYPE=SNP
ADD REPLY
0
Entering edit mode

Yes, I do see that in the SnpEff documentation. But I want to find which mutations alter TFBS/motif

ADD REPLY
0
Entering edit mode

Since you are looking for numbers (not records, If I understand correct), just do a grep and count (on OP records, it should give 2):

$ grep -wc "TF_binding_site_variant" input.vcf

If you are looking for records, use following filter on OP vcf (two records will be listed):

 $  java -jar /opt/snpEff/SnpSift.jar filter "ANN[*].EFFECT has 'TF_binding_site_variant'" snpeff.vcf

output using OP records:

##SnpSiftVersion="SnpSift 4.3t (build 2017-11-24 10:18), by Pablo Cingolani"
##SnpSiftCmd="SnpSift Filter 'ANN[*].EFFECT has 'TF_binding_site_variant'' snpeff.vcf"
##FILTER=<ID=SnpSift,Description="SnpSift 4.3t (build 2017-11-24 10:18), by Pablo Cingolani, Expression used: ANN[*].EFFECT has 'TF_binding_site_variant'">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
1   100225517   MU3692753   A   G   .   .   CONSEQUENCE=FRRS1|ENSG00000156869|1|FRRS1-001|ENST00000287474||intron_variant||,FRRS1|ENSG00000156869|1|FRRS1-004|ENST00000370176||intron_variant||,FRRS1|ENSG00000156869|1|FRRS1-201|ENST00000414213||intron_variant||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=A>G;project_count=1;studies=PCAWG;tested_donors=12198;ANN=G|TF_binding_site_variant|LOW|||FOXA2|MA0047.2|||n.100225517T>C||||||,G|TF_binding_site_variant|LOW|||FOXA1|MA0148.1|||n.100225517T>C||||||,G|intron_variant|MODIFIER|FRRS1|ENSG00000156869|transcript|ENST00000287474|protein_coding|1/16|c.-106+5336T>C||||||,G|intron_variant|MODIFIER|FRRS1|ENSG00000156869|transcript|ENST00000414213|protein_coding|1/16|c.-106+5336T>C||||||,G|intron_variant|MODIFIER|FRRS1|ENSG00000156869|transcript|ENST00000370176|retained_intron|1/2|n.25+6646T>C||||||;SNP;HOM;VARTYPE=SNP
1   100274466   MU2855033   T   C   .   .   CONSEQUENCE=||||||intergenic_region||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=T>C;project_count=1;studies=PCAWG;tested_donors=12198;ANN=C|TF_binding_site_variant|LOW|||Srf|MA0083.1|||n.100274466A>G||||||,C|intergenic_region|MODIFIER|Y_RNA-AL451051.1|ENSG00000202254-ENSG00000252226|intergenic_region|ENSG00000202254-ENSG00000252226|||n.100274466T>C||||||;SNP;HOM;VARTYPE=SNP

if you would like to fitler any variant with TF_binding effect use:

$  java -jar /opt/snpEff/SnpSift.jar filter "ANN[*].EFFECT =~ 'TF_binding'" snpeff.vcf
ADD REPLY
0
Entering edit mode

No this is not the one I'm telling. You can see there is also see in the input showing VARTYPE = SNP/IND/DEL/MNP. What I want is to count the number of varainttypes altering TFBS/motif. It should give something like this [See the first two columns - https://www.frontiersin.org/files/Articles/18778/fgene-03-00100-HTML/image_m/fgene-03-00100-t001.jpg]

ADD REPLY
0
Entering edit mode

If you are looking for summary, then you look into summary.html from snpeff

ADD REPLY

Login before adding your answer.

Traffic: 2681 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6