Annotation of VCF file
0
0
Entering edit mode
6.4 years ago
Vasu ▴ 770

Hello,

This is the first time I'm working with VCF file. Data in the VCF file looks like following. I see that in Snpeff software the VCF files has field "ANN". It is not found in my file. I downloaded this file from ICGC. Do I need re-annotate again with Snpeff to get all other information?

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
1       100000409       MU1214865       G       A       .       .       CONSEQUENCE=||||||intergenic_region||,RP11-413P11.1|ENSG00000224445|1|RP11-413P11.1-001|ENST00000438829||upstream_gene_variant||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=G>A;project_count=1;studies=PCAWG;tested_donors=12198
1       100001783       MU4631949       C       G       .       .       CONSEQUENCE=||||||intergenic_region||,RP11-413P11.1|ENSG00000224445|1|RP11-413P11.1-001|ENST00000438829||upstream_gene_variant||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=C>G;project_count=1;studies=PCAWG;tested_donors=12198
1       100003664       MU78268308      C       T       .       .       CONSEQUENCE=||||||intergenic_region||,RP11-413P11.1|ENSG00000224445|1|RP11-413P11.1-001|ENST00000438829||upstream_gene_variant||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=C>T;project_count=1;studies=PCAWG;tested_donors=12198
1       100007225       MU4631957       T       C       .       .       CONSEQUENCE=||||||intergenic_region||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=T>C;project_count=1;studies=PCAWG;tested_donors=12198
1       100008212       MU28770474      T       C       .       .       CONSEQUENCE=||||||intergenic_region||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=T>C;project_count=1;studies=PCAWG;tested_donors=12198

I wanted to check how much percentage of mutations were affecting TF binding sites and motifs. If I need to re-annotate can you please give some ideas how to do it and how I can check the mutations affecting TFBS and motifs.

vcf mutation snpeff • 2.6k views
ADD COMMENT
1
Entering edit mode

If you need to add annotation you can use Snpeff, VEP or annovar. Those tools have pretty good documentation so you should be able to figure out how to use them.

ADD REPLY
1
Entering edit mode

It is already annotated. In INFO field, you can see the consequence, gene, transcript, studies, mutation. I guess it is done with one of the ENSEMBL annotators (as gene and transcript are ENS entries). My guess is VEP. If you are looking for more annotation, you can reannotate with VEP again with more flags.

ADD REPLY
1
Entering edit mode

The file looks like already annotated.

ADD REPLY
0
Entering edit mode

Thank you all for the reply. I know it is annotated, but confused with vcf file annotated using Snpeff which has "ANN" field. I would like to check for mutations that affect Transcription factor binding sites. Any idea how to do this and which tool to use?

ADD REPLY
0
Entering edit mode

Have you tried filtering the output with "SnpSift"?

ADD REPLY
0
Entering edit mode

Hello arup,

Yes I'm using SnpSift now for filtering. But I don't see any anything for checking the mutations that affect TF binding sites or motifs. Could you help me in this. after re-annotation data looks like following.

1       100274466       MU2855033       T       C       .       .       CONSEQUENCE=||||||intergenic_region||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=T>C;project_count=1;studies=PCAWG;tested_donors=12198;ANN=C|TF_binding_site_variant|LOW|||Srf|MA0083.1|||n.100274466A>G||||||,C|intergenic_region|MODIFIER|Y_RNA-AL451051.1|ENSG00000202254-ENSG00000252226|intergenic_region|ENSG00000202254-ENSG00000252226|||n.100274466T>C||||||
1       101774964       MU78905029      T       G       .       .       CONSEQUENCE=||||||intergenic_region||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=T>G;project_count=1;studies=PCAWG;tested_donors=12198;ANN=G|TF_binding_site_variant|MODIFIER|||CTCF|MA0139.1|||n.101774964T>G||||||,G|intergenic_region|MODIFIER|PPIAP7-RP11-157N3.1|ENSG00000173810-ENSG00000231671|intergenic_region|ENSG00000173810-ENSG00000231671|||n.101774964T>G||||||
1       101774966       MU3316414       A       C       .       .       CONSEQUENCE=||||||intergenic_region||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=A>C;project_count=1;studies=PCAWG;tested_donors=12198;ANN=C|TF_binding_site_variant|MODIFIER|||CTCF|MA0139.1|||n.101774966A>C||||||,C|intergenic_region|MODIFIER|PPIAP7-RP11-157N3.1|ENSG00000173810-ENSG00000231671|intergenic_region|ENSG00000173810-ENSG00000231671|||n.101774966A>C||||||
1       101823131       MU4639199       A       G       .       .       CONSEQUENCE=RP11-157N3.1|ENSG00000231671|+|RP11-157N3.1-002|ENST00000439146||intron_variant||,RP11-157N3.1|ENSG00000231671|+|RP11-157N3.1-001|ENST00000444327||intron_variant||;OCCURRENCE=LIRI-JP|2|258|0.00775;affected_donors=2;mutation=A>G;project_count=1;studies=PCAWG;tested_donors=12198;ANN=G|TF_binding_site_variant|MODIFIER|||CTCF|MA0139.1|||n.101823131T>C||||||,G|intron_variant|MODIFIER|RP11-157N3.1|ENSG00000231671|transcript|ENST00000444327|lincRNA|1/1|n.137+7390A>G||||||,G|intron_variant|MODIFIER|RP11-157N3.1|ENSG00000231671|transcript|ENST00000439146|lincRNA|4/4|n.418+7390A>G||||||
1       103657081       MU1116435       C       T       .       .       CONSEQUENCE=||||||intergenic_region||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=C>T;project_count=1;studies=PCAWG;tested_donors=12198;ANN=T|TF_binding_site_variant|LOW|||AP1|MA0099.2|||n.103657081C>T||||||,T|TF_binding_site_variant|MODIFIER|||Cfos|MA0099.1|||n.103657081G>A||||||,T|intergenic_region|MODIFIER|COL11A1-RP11-347K2.1|ENSG00000060718-ENSG00000232753|intergenic_region|ENSG00000060718-ENSG00000232753|||n.103657081C>T||||||
ADD REPLY

Login before adding your answer.

Traffic: 2788 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6