Question: SNPs genomic classification based on VCF annotation
0
gravatar for aarvaBioinfo
2.5 years ago by
aarvaBioinfo10
aarvaBioinfo10 wrote:

Dear All, I want to make a list of SNPs associated to with genes in Genomic region as below :

  1. downstream_gene_variant
  2. upstream_gene_variant
  3. synonymous_variant
  4. Intron_variant
  5. missense_variant
  6. 5'-prime_UTR_variant
  7. 3'-prime_UTR_variant

using my "Annotated vcf file" by snpEff program. and plot them using R program. so anyone has idea how can we do this?

snp R • 718 views
ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by aarvaBioinfo10
2
gravatar for cpad0112
2.5 years ago by
cpad011215k
Hyderabad India
cpad011215k wrote:

If you already have vep annotated vcf, you can filter by SO terms. You can go for lollipop plots per gene and histogram plot for each SO category

ADD COMMENTlink written 2.5 years ago by cpad011215k

hi cpad0112,

Thank you very much for your help. my VCF file was annotated with snpEff and I found that some SNPs shows both "upstream" and "downstream_gene_region" in the same data line so do you have some idea how can I filter this and get exact count and percentage of SNPs with the Gene ID as "downstream" , "upstream"......?

ADD REPLYlink written 2.5 years ago by aarvaBioinfo10

Sure. If you could post vep annotation data format, example data and output required, biostars would be able to help you. In the mean time, you can refer to ensembl documentation on filtering vep output here: https://asia.ensembl.org/info/docs/tools/vep/script/vep_filter.html. If output is vcf, you can also use bcftools to filter variants.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by cpad011215k

hi, cpad0112,

Thank you for your reply and valuable link. I will try this. beside as you asked for example vcf set so here is example of annotated vcf data as: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Chr01 85514 . G C 222.003 PASS VDB=0.0681475;SGB=-0.693021;MQSB=1;MQ0F=0;AF1=1;AC1=2;MQ=37;FQ=-107.987;ANN=C|upstream_gene_variant|MODIFIER|Glyma.01G000500|Glyma.01G000500.Wm82.a2.v1|transcript|Glyma.01G000500.1.Wm82.a2.v1|protein_coding||c.-4775G>C|||||4775|,C|downstream_gene_variant|MODIFIER|Glyma.01G000400|Glyma.01G000400.Wm82.a2.v1|transcript|Glyma.01G000400.1.Wm82.a2.v1|protein_coding||c.5040C>G|||||4638|,C|intergenic_region|MODIFIER|Glyma.01G000300-Glyma.01G000400|Glyma.01G000300.Wm82.a2.v1-Glyma.01G000400.Wm82.a2.v1|intergenic_region|Glyma.01G000300.Wm82.a2.v1-Glyma.01G000400.Wm82.a2.v1|||n.85514G>C||||||DP=476;DP4=0,0,223,220 GT:PL Chr01 85582 . G A 222.003 PASS VDB=0.0681475;SGB=-0.693021;MQSB=1;MQ0F=0;AF1=1;AC1=2;MQ=37;FQ=-107.987;ANN=A|upstream_gene_variant|MODIFIER|Glyma.01G000500|Glyma.01G000500.Wm82.a2.v1|transcript|Glyma.01G000500.1.Wm82.a2.v1|protein_coding||c.-4707G>A|||||4707|,A|downstream_gene_variant|MODIFIER|Glyma.01G000400|Glyma.01G000400.Wm82.a2.v1|transcript|Glyma.01G000400.1.Wm82.a2.v1|protein_coding||c.4972C>T|||||4570|,A|intergenic_region|MODIFIER|Glyma.01G000300-Glyma.01G000400|Glyma.01G000300.Wm82.a2.v1-Glyma.01G000400.Wm82.a2.v1|intergenic_region|Glyma.01G000300.Wm82.a2.v1-Glyma.01G000400.Wm82.a2.v1|||n.85582G>A||||||DP=273;DP4=0,0,126,132 GT:PL Chr02 36650710 . G A,C 222.003 PASS VDB=0.0681475;SGB=-0.693021;MQSB=1;MQ0F=0;AF1=1;AC1=2;MQ=37;FQ=-107.987;ANN=A|5_prime_UTR_variant|MODIFIER|Glyma.16G205900|Glyma.16G205900.Wm82.a2.v1|transcript|Glyma.16G205900.1.Wm82.a2.v1|protein_coding|2/3|c.-211G>A|||||1806|,A|upstream_gene_variant|MODIFIER|Glyma.16G205800|Glyma.16G205800.Wm82.a2.v1|transcript|Glyma.16G205800.1.Wm82.a2.v1|protein_coding||c.-4909C>T|||||4909|;RPB=0.931841;MQB=1;BQB=0.295808;PV4=0.026087,0.00614403,1,1;DP=343;DP4=6,13,147,142
Chr02 36653298 . A T,G 222.008 PASS VDB=0.462961;SGB=-0.693021;MQSB=0.981338;MQ0F=0;AF1=1;AC1=2;MQ=32;FQ=-107.987;ANN=T|synonymous_variant|LOW|Glyma.16G205900|Glyma.16G205900.Wm82.a2.v1|transcript|Glyma.16G205900.1.Wm82.a2.v1|protein_coding|3/3|c.783A>T|p.Ala261Ala|1235/3686|783/2340|261/779||,T|upstream_gene_variant|MODIFIER|Glyma.16G206000|Glyma.16G206000.Wm82.a2.v1|transcript|Glyma.16G206000.1.Wm82.a2.v1|protein_coding||c.-3094A>T|||||2950|;DP=254;DP4=0,0,116,119 GT:PL
Chr03 36653505 . A G 222 PASS VDB=0.729579;SGB=-0.686358;MQSB=0.885987;MQ0F=0;AF1=1;AC1=2;MQ=36;FQ=-68.9862;ANN=G|synonymous_variant|LOW|Glyma.16G205900|Glyma.16G205900.Wm82.a2.v1|transcript|Glyma.16G205900.1.Wm82.a2.v1|protein_coding|3/3|c.990A>G|p.Gln330Gln|1442/3686|990/2340|330/779||,G|upstream_gene_variant|MODIFIER|Glyma.16G206000|Glyma.16G206000.Wm82.a2.v1|transcript|Glyma.16G206000.1.Wm82.a2.v1|protein_coding||c.-2887A>G|||||2743|;DP=277;DP4=0,0,127,138 GT:PL
Chr03 36653531 . A G 222.001 PASS VDB=0.341291;SGB=-0.683931;MQSB=0.899815;MQ0F=0;AF1=1;AC1=2;MQ=36;FQ=-65.9862;ANN=G|missense_variant|MODERATE|Glyma.16G205900|Glyma.16G205900.Wm82.a2.v1|transcript|Glyma.16G205900.1.Wm82.a2.v1|protein_coding|3/3|c.1016A>G|p.Asn339Ser|1468/3686|1016/2340|339/779||,G|upstream_gene_variant|MODIFIER|Glyma.16G206000|Glyma.16G206000.Wm82.a2.v1|transcript|Glyma.16G206000.1.Wm82.a2.v1|protein_coding||c.-2861A>G|||||2717|;DP=273;DP4=0,0,126,132 GT:PL

and format for required output : table

CHROM POS SNP-ID Ref Alt Gene_name Gene_ID SNP_types or SNP-classification SNP-number Percent(%)
Chr02 36653298 Chr02_36653298_A_T A T,G Glyma.16G205900 Glyma.16G205900.Wm82.a2.v1 synonymous_variant 10 0.5%

Is there a solution I can get this output using my annotated vcf file ??

ADD REPLYlink written 2.5 years ago by aarvaBioinfo10

Where are you getting 10 and 0.5% from, in expected output? aarvaBioinfo. Rest things are easy. Btw, it would help if you could post vcf headers some where and share with forum, in addition to vcf records (if it is not an issue).

ADD REPLYlink written 2.5 years ago by cpad011215k

hi, cpad0112, I highly appreciate you for help. I gave just for example 10 and 0.5%. not calculated value. sorry due to confidential, I could not share vcf data. if i just keep table header as: CHROM POS SNP-ID Ref Alt Gene_name Gene_ID SNP_types or SNP-classification then it can be made using my given example data?

ADD REPLYlink written 2.5 years ago by aarvaBioinfo10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1655 users visited in the last hour
_