Remove a gene with multiple variants after snpEff annotation (VCF file)
0
0
Entering edit mode
7.7 years ago
zengtony743 ▴ 80

Here is an example of the information of the VCF data line:

chr10   4450587 .   T   C   2757.97 PASS    ANN=C|downstream_gene_va
riant|MODIFIER|Armt1|ENSMUSG00000061759|transcript|ENSMUST00000143037.6|processe
d_transcript||n.*175T>C|||||175|,C|intron_variant|MODIFIER|Armt1|ENSMUSG00000061
759|transcript|ENSMUST00000095893.9|protein_coding|3/4|c.393-60T>C||||||,C|intro
n_variant|MODIFIER|Armt1|ENSMUSG00000061759|transcript|ENSMUST00000118544.6|prot
ein_coding|3/3|c.393-60T>C||||||,C|intron_variant|MODIFIER|Armt1|ENSMUSG00000061
759|transcript|ENSMUST00000152294.1|nonsense_mediated_decay|2/3|c.148-60T>C|||||
|   GT:AD:DP:GQ:PL  0/1:18,12:30:99:334,0,510
chr10   5034864 .   T   G   58.16   PASS    ANN=G|downstream_gene_va
riant|MODIFIER|Gm25694|ENSMUSG00000093189|transcript|ENSMUST00000175448.1|ribozy
me||n.*1311A>C|||||1311|,G|intron_variant|MODIFIER|Syne1|ENSMUSG00000096054|tran
script|ENSMUST00000095899.3|protein_coding|13/16|c.2244+69A>C|||||| GT:AD:DP
:GQ:PL  0/1:4,1:5:12:12,0,121
chr10   5231940 .   G   A   4507.45 PASS    ANN=A|intragenic_variant
|MODIFIER|Syne1|ENSMUSG00000096054|gene_variant|ENSMUSG00000096054|||n.5231940C>
T|||||| GT:AD:DP:GQ:PL  0/1:3,4:7:63:86,0,63
chr10   5248017 .   A   G   4754.39 PASS    ANN=G|intragenic_variant
|MODIFIER|Syne1|ENSMUSG00000096054|gene_variant|ENSMUSG00000096054|||n.5248017T>
C|||||| GT:AD:DP:GQ:PL  0/1:16,7:23:99:159,0,502
chr10   5248019 .   A   G   6149.69 PASS    ANN=G|intragenic_variant
|MODIFIER|Syne1|ENSMUSG00000096054|gene_variant|ENSMUSG00000096054|||n.5248019T>
C|||||| GT:AD:DP:GQ:PL  0/1:17,8:25:99:191,0,507
chr10   5298584 .   A   G   57.56   PASS    ANN=G|intragenic_variant
|MODIFIER|Syne1|ENSMUSG00000096054|gene_variant|ENSMUSG00000096054|||n.5298584T>
C|||||| GT:AD:DP:GQ:PL  0/1:2,3:5:56:96,0,56
chr10   6525873 .   C   A   151.79  PASS    ANN=A|intergenic_region|
MODIFIER|Rgs17-Gm10945|ENSMUSG00000019775-ENSMUSG00000078488|intergenic_region|E
NSMUSG00000019775-ENSMUSG00000078488|||n.6525873C>A||||||   GT:AD:DP:GQ:PL  
0/1:6,1:7:14:14,0,192

You will see 4 different genes have been annotated in the file and they are Armt1, Gm25694, Syne1 and Rgs17. However, there are 4 variants have been called in gene Syne1 and they are false positive callings in my case. I need to remove the gene Syne1 from the VCF file. So i expect only Armt1, Gm25694 and Rgs17 left in the final table

Or here is the final vcf that i expect:

chr10   4450587 .   T   C   2757.97 PASS    ANN=C|downstream_gene_va
riant|MODIFIER|Armt1|ENSMUSG00000061759|transcript|ENSMUST00000143037.6|processe
d_transcript||n.*175T>C|||||175|,C|intron_variant|MODIFIER|Armt1|ENSMUSG00000061
759|transcript|ENSMUST00000095893.9|protein_coding|3/4|c.393-60T>C||||||,C|intro
n_variant|MODIFIER|Armt1|ENSMUSG00000061759|transcript|ENSMUST00000118544.6|prot
ein_coding|3/3|c.393-60T>C||||||,C|intron_variant|MODIFIER|Armt1|ENSMUSG00000061
759|transcript|ENSMUST00000152294.1|nonsense_mediated_decay|2/3|c.148-60T>C|||||
|   GT:AD:DP:GQ:PL  0/1:18,12:30:99:334,0,510
chr10   5034864 .   T   G   58.16   PASS    ANN=G|downstream_gene_va
riant|MODIFIER|Gm25694|ENSMUSG00000093189|transcript|ENSMUST00000175448.1|ribozy
me||n.*1311A>C|||||1311|,G|intron_variant|MODIFIER|Syne1|ENSMUSG00000096054|tran
script|ENSMUST00000095899.3|protein_coding|13/16|c.2244+69A>C|||||| GT:AD:DP
:GQ:PL  0/1:4,1:5:12:12,0,121
chr10   6525873 .   C   A   151.79  PASS    ANN=A|intergenic_region|
MODIFIER|Rgs17-Gm10945|ENSMUSG00000019775-ENSMUSG00000078488|intergenic_region|E
NSMUSG00000019775-ENSMUSG00000078488|||n.6525873C>A||||||   GT:AD:DP:GQ:PL  
0/1:6,1:7:14:14,0,192

I do not know how to write scripts to remove these genes which have multiple variants. I read vcftools and snpEff, i did not find tools can do this work. Anyone can provide scripts or know any tool can do this. Thanks

vcf filter multiple variants in one gene • 2.1k views
ADD COMMENT

Login before adding your answer.

Traffic: 2572 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6