Question: summarize SNPs and indels information in vcf file
1
gravatar for Kurban
5.0 years ago by
Kurban170
china/Urumqi/xinjiang academy of animal scinces
Kurban170 wrote:

through the vcftools i got a file(my.var-final.vcf 27 MB) which contain in formation of SNPs and indels:

 

##INFO=<ID=CLR,Number=1,Type=Integer,Description="Log ratio of genotype likelihoods with and without the constraint">
##INFO=<ID=UGT,Number=1,Type=String,Description="The most probable unconstrained genotype configuration in the trio">
##INFO=<ID=CGT,Number=1,Type=String,Description="The most probable constrained genotype configuration in the trio">
##INFO=<ID=PV4,Number=4,Type=Float,Description="P-values for strand bias, baseQ bias, mapQ bias and tail distance bias">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##INFO=<ID=PC2,Number=2,Type=Integer,Description="Phred probability of the nonRef allele frequency in group1 samples being larger (,smaller) tha
n in group2.">
##INFO=<ID=PCHI2,Number=1,Type=Float,Description="Posterior weighted chi^2 P-value for testing the association between group1 and group2 samples
.">
##INFO=<ID=QCHI2,Number=1,Type=Integer,Description="Phred scaled PCHI2.">
##INFO=<ID=PR,Number=1,Type=Integer,Description="# permutations yielding a smaller PCHI2.">
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GL,Number=3,Type=Float,Description="Likelihoods for RR,RA,AA genotypes (R=ref,A=alt)">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="# high-quality bases">
##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand bias P-value">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT my-sorted.bam
comp904_c0_seq1 30 . G T 73.5 . DP=4;VDB=0.0014;AF1=1;AC1=2;DP4=0,0,4,0;MQ=60;FQ=-39 GT:PL:GQ 1/1:106,
12,0:21
comp904_c0_seq1 37 . C T 52 . DP=4;VDB=0.0014;AF1=1;AC1=2;DP4=0,0,3,0;MQ=60;FQ=-36 GT:PL:GQ 1/1:84,9
,0:16
comp904_c0_seq1 41 . A T 64.3 . DP=6;VDB=0.0020;AF1=1;AC1=2;DP4=0,0,5,0;MQ=60;FQ=-42 GT:PL:GQ 1/1:97,1
5,0:27
comp904_c0_seq1 74 . A G 4.77 . DP=21;VDB=0.0147;AF1=0.4999;AC1=1;DP4=10,5,3,1;MQ=60;FQ=6.99;PV4=1,1.2e-06,1,1
GT:PL:GQ 0/1:33,0,255:33
comp904_c0_seq1 133 . G T 137 . DP=36;VDB=0.0404;AF1=0.5;AC1=1;DP4=2,3,19,10;MQ=60;FQ=33;PV4=0.35,1.6e-09,1,1
GT:PL:GQ 0/1:167,0,60:63


this there any way to summarize this variation information, like some tools, scripts or something?

snp • 5.2k views
ADD COMMENTlink modified 3.4 years ago by alexisdereeper30 • written 5.0 years ago by Kurban170

thank you guys

ADD REPLYlink written 5.0 years ago by Kurban170
2
gravatar for Zev.Kronenberg
5.0 years ago by
United States
Zev.Kronenberg11k wrote:

There are tons of tools that will give you what you want.  Here is one:

 

http://vcftools.sourceforge.net/documentation.html#file

ADD COMMENTlink modified 5.0 years ago • written 5.0 years ago by Zev.Kronenberg11k

Dear sir. Zev.Kronenberg!

the data set i hava analized is transcriptome data, and checked the tools u have recommended :http://vcftools.sourceforge.net/perl_module.html

and the commend i have used is this :

kurban@kurban-X550VC:~/Desktop/SNPs/CD$ /home/kurban/Downloads/vcftools_0.1.12b/bin/vcf-indel-stats < my.var-final.vcf > out.txt

and the terminal result shows this:

Use of uninitialized value in pattern match (m//) at /home/kurban/Downloads/vcftools_0.1.12b/bin/vcf-indel-stats line 49.
Use of uninitialized value in concatenation (.) or string at /home/kurban/Downloads/vcftools_0.1.12b/bin/vcf-indel-stats line 49.
<: No such file or directory at /home/kurban/Downloads/vcftools_0.1.12b/bin/vcf-indel-stats line 18
    main::error('<: No such file or directory') called at /home/kurban/Downloads/vcftools_0.1.12b/bin/vcf-indel-stats line 50
    main::init_regions('HASH(0x84c998)') called at /home/kurban/Downloads/vcftools_0.1.12b/bin/vcf-indel-stats line 71
    main::do_stats('HASH(0x84c998)') called at /home/kurban/Downloads/vcftools_0.1.12b/bin/vcf-indel-stats line 9


and i have located the vcf-indel-stats before run the commend and it gives it's location:

kurban@kurban-X550VC:~/Desktop/SNPs/CD$ locate vcf-indel-stats
/home/kurban/.local/share/Trash/files/vcftools_0.1.12b/bin/vcf-indel-stats
/home/kurban/.local/share/Trash/files/vcftools_0.1.12b/perl/vcf-indel-stats
/home/kurban/.local/share/Trash/files/vcftools_0.2.1.12b/bin/vcf-indel-stats
/home/kurban/.local/share/Trash/files/vcftools_0.2.1.12b/perl/vcf-indel-stats
/home/kurban/Downloads/vcftools_0.1.12b/bin/vcf-indel-stats
/home/kurban/Downloads/vcftools_0.1.12b/perl/vcf-indel-stats

i do not know where did i go wrong, could u please give the comment i added a look and give me some corrections!

best regards

kurban

ADD REPLYlink written 5.0 years ago by Kurban170
0
gravatar for EagleEye
5.0 years ago by
EagleEye6.4k
Sweden
EagleEye6.4k wrote:

This post might be helpful to you:

Capturing clusters having T to C mutation

You can have a summary of SNP's using this script which generate graph and also table for each VCF 4.0 (must be one VCF per sample). 

https://github.com/santhilalsubhash/TransExtract_betaV1.2 (Wiki needed to be improved).

 

 

ADD COMMENTlink modified 5.0 years ago • written 5.0 years ago by EagleEye6.4k
0
gravatar for Natasha Latysheva
5.0 years ago by
United Kingdom
Natasha Latysheva50 wrote:

Have you had a look at Variant Effect Predictor? (http://www.ensembl.org/info/docs/tools/vep/index.html)

I have used it for SNP calling, I believe it reports indels as well. If you have genomic coordinates it will simply match these against the reference genome for your species and report differences, while predicting the effect of these variations at the mRNA/protein level, etc.

ADD COMMENTlink written 5.0 years ago by Natasha Latysheva50

thank you Natasha, i have searched related info. of the tool u have recommended from the net , it sounds like pretty good tool. but it may sound weird to u,  i am in Urumqi china. here sometimes i could not open some sites, and  connection u have provided( maybe the home page ?)also could not be viewed, i not know why but some times that happens.

best regards

ADD REPLYlink modified 5.0 years ago • written 5.0 years ago by Kurban170
0
gravatar for alexisdereeper
3.4 years ago by
alexisdereeper30 wrote:

The SNiPlay online pipeline implements VCFtools and allows to summarize statistics information from VCF file: http://sniplay.southgreen.fr/cgi-bin/analysis_v3.cgi

ADD COMMENTlink written 3.4 years ago by alexisdereeper30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2090 users visited in the last hour