Question: Remove a gene with multiple variants after snpEff annotation (VCF file)
0
gravatar for zengtony743
3.2 years ago by
zengtony74370
Canada
zengtony74370 wrote:

Here is an example of the information of the VCF data line:

chr10   4450587 .   T   C   2757.97 PASS    ANN=C|downstream_gene_va
riant|MODIFIER|Armt1|ENSMUSG00000061759|transcript|ENSMUST00000143037.6|processe
d_transcript||n.*175T>C|||||175|,C|intron_variant|MODIFIER|Armt1|ENSMUSG00000061
759|transcript|ENSMUST00000095893.9|protein_coding|3/4|c.393-60T>C||||||,C|intro
n_variant|MODIFIER|Armt1|ENSMUSG00000061759|transcript|ENSMUST00000118544.6|prot
ein_coding|3/3|c.393-60T>C||||||,C|intron_variant|MODIFIER|Armt1|ENSMUSG00000061
759|transcript|ENSMUST00000152294.1|nonsense_mediated_decay|2/3|c.148-60T>C|||||
|   GT:AD:DP:GQ:PL  0/1:18,12:30:99:334,0,510
chr10   5034864 .   T   G   58.16   PASS    ANN=G|downstream_gene_va
riant|MODIFIER|Gm25694|ENSMUSG00000093189|transcript|ENSMUST00000175448.1|ribozy
me||n.*1311A>C|||||1311|,G|intron_variant|MODIFIER|Syne1|ENSMUSG00000096054|tran
script|ENSMUST00000095899.3|protein_coding|13/16|c.2244+69A>C|||||| GT:AD:DP
:GQ:PL  0/1:4,1:5:12:12,0,121
chr10   5231940 .   G   A   4507.45 PASS    ANN=A|intragenic_variant
|MODIFIER|Syne1|ENSMUSG00000096054|gene_variant|ENSMUSG00000096054|||n.5231940C>
T|||||| GT:AD:DP:GQ:PL  0/1:3,4:7:63:86,0,63
chr10   5248017 .   A   G   4754.39 PASS    ANN=G|intragenic_variant
|MODIFIER|Syne1|ENSMUSG00000096054|gene_variant|ENSMUSG00000096054|||n.5248017T>
C|||||| GT:AD:DP:GQ:PL  0/1:16,7:23:99:159,0,502
chr10   5248019 .   A   G   6149.69 PASS    ANN=G|intragenic_variant
|MODIFIER|Syne1|ENSMUSG00000096054|gene_variant|ENSMUSG00000096054|||n.5248019T>
C|||||| GT:AD:DP:GQ:PL  0/1:17,8:25:99:191,0,507
chr10   5298584 .   A   G   57.56   PASS    ANN=G|intragenic_variant
|MODIFIER|Syne1|ENSMUSG00000096054|gene_variant|ENSMUSG00000096054|||n.5298584T>
C|||||| GT:AD:DP:GQ:PL  0/1:2,3:5:56:96,0,56
chr10   6525873 .   C   A   151.79  PASS    ANN=A|intergenic_region|
MODIFIER|Rgs17-Gm10945|ENSMUSG00000019775-ENSMUSG00000078488|intergenic_region|E
NSMUSG00000019775-ENSMUSG00000078488|||n.6525873C>A||||||   GT:AD:DP:GQ:PL  
0/1:6,1:7:14:14,0,192

You will see 4 different genes have been annotated in the file and they are Armt1, Gm25694, Syne1 and Rgs17. However, there are 4 variants have been called in gene Syne1 and they are false positive callings in my case. I need to remove the gene Syne1 from the VCF file. So i expect only Armt1, Gm25694 and Rgs17 left in the final table

Or here is the final vcf that i expect:

chr10   4450587 .   T   C   2757.97 PASS    ANN=C|downstream_gene_va
riant|MODIFIER|Armt1|ENSMUSG00000061759|transcript|ENSMUST00000143037.6|processe
d_transcript||n.*175T>C|||||175|,C|intron_variant|MODIFIER|Armt1|ENSMUSG00000061
759|transcript|ENSMUST00000095893.9|protein_coding|3/4|c.393-60T>C||||||,C|intro
n_variant|MODIFIER|Armt1|ENSMUSG00000061759|transcript|ENSMUST00000118544.6|prot
ein_coding|3/3|c.393-60T>C||||||,C|intron_variant|MODIFIER|Armt1|ENSMUSG00000061
759|transcript|ENSMUST00000152294.1|nonsense_mediated_decay|2/3|c.148-60T>C|||||
|   GT:AD:DP:GQ:PL  0/1:18,12:30:99:334,0,510
chr10   5034864 .   T   G   58.16   PASS    ANN=G|downstream_gene_va
riant|MODIFIER|Gm25694|ENSMUSG00000093189|transcript|ENSMUST00000175448.1|ribozy
me||n.*1311A>C|||||1311|,G|intron_variant|MODIFIER|Syne1|ENSMUSG00000096054|tran
script|ENSMUST00000095899.3|protein_coding|13/16|c.2244+69A>C|||||| GT:AD:DP
:GQ:PL  0/1:4,1:5:12:12,0,121
chr10   6525873 .   C   A   151.79  PASS    ANN=A|intergenic_region|
MODIFIER|Rgs17-Gm10945|ENSMUSG00000019775-ENSMUSG00000078488|intergenic_region|E
NSMUSG00000019775-ENSMUSG00000078488|||n.6525873C>A||||||   GT:AD:DP:GQ:PL  
0/1:6,1:7:14:14,0,192

I do not know how to write scripts to remove these genes which have multiple variants. I read vcftools and snpEff, i did not find tools can do this work. Anyone can provide scripts or know any tool can do this. Thanks

ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by zengtony74370
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1994 users visited in the last hour