summarize SNPs and indels information in vcf file
5
1
Entering edit mode
7.0 years ago
Kurban ▴ 200

through the vcftools i got a file(my.var-final.vcf 27 MB) which contain in formation of SNPs and indels:

 

##INFO=<ID=CLR,Number=1,Type=Integer,Description="Log ratio of genotype likelihoods with and without the constraint">
##INFO=<ID=UGT,Number=1,Type=String,Description="The most probable unconstrained genotype configuration in the trio">
##INFO=<ID=CGT,Number=1,Type=String,Description="The most probable constrained genotype configuration in the trio">
##INFO=<ID=PV4,Number=4,Type=Float,Description="P-values for strand bias, baseQ bias, mapQ bias and tail distance bias">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##INFO=<ID=PC2,Number=2,Type=Integer,Description="Phred probability of the nonRef allele frequency in group1 samples being larger (,smaller) tha
n in group2.">
##INFO=<ID=PCHI2,Number=1,Type=Float,Description="Posterior weighted chi^2 P-value for testing the association between group1 and group2 samples
.">
##INFO=<ID=QCHI2,Number=1,Type=Integer,Description="Phred scaled PCHI2.">
##INFO=<ID=PR,Number=1,Type=Integer,Description="# permutations yielding a smaller PCHI2.">
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GL,Number=3,Type=Float,Description="Likelihoods for RR,RA,AA genotypes (R=ref,A=alt)">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="# high-quality bases">
##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand bias P-value">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT my-sorted.bam
comp904_c0_seq1 30 . G T 73.5 . DP=4;VDB=0.0014;AF1=1;AC1=2;DP4=0,0,4,0;MQ=60;FQ=-39 GT:PL:GQ 1/1:106,
12,0:21
comp904_c0_seq1 37 . C T 52 . DP=4;VDB=0.0014;AF1=1;AC1=2;DP4=0,0,3,0;MQ=60;FQ=-36 GT:PL:GQ 1/1:84,9
,0:16
comp904_c0_seq1 41 . A T 64.3 . DP=6;VDB=0.0020;AF1=1;AC1=2;DP4=0,0,5,0;MQ=60;FQ=-42 GT:PL:GQ 1/1:97,1
5,0:27
comp904_c0_seq1 74 . A G 4.77 . DP=21;VDB=0.0147;AF1=0.4999;AC1=1;DP4=10,5,3,1;MQ=60;FQ=6.99;PV4=1,1.2e-06,1,1
GT:PL:GQ 0/1:33,0,255:33
comp904_c0_seq1 133 . G T 137 . DP=36;VDB=0.0404;AF1=0.5;AC1=1;DP4=2,3,19,10;MQ=60;FQ=33;PV4=0.35,1.6e-09,1,1
GT:PL:GQ 0/1:167,0,60:63


this there any way to summarize this variation information, like some tools, scripts or something?

snp • 7.0k views
ADD COMMENT
0
Entering edit mode

thank you guys

ADD REPLY
2
Entering edit mode
7.0 years ago

There are tons of tools that will give you what you want. Here is one: http://vcftools.sourceforge.net/documentation.html#file

ADD COMMENT
0
Entering edit mode

Dear sir. Kronenberg!

the data set i hava analized is transcriptome data, and checked the tools u have recommended :http://vcftools.sourceforge.net/perl_module.html

and the commend i have used is this:

kurban@kurban-X550VC:~/Desktop/SNPs/CD$ /home/kurban/Downloads/vcftools_0.1.12b/bin/vcf-indel-stats < my.var-final.vcf > out.txt

and the terminal result shows this:

Use of uninitialized value in pattern match (m//) at /home/kurban/Downloads/vcftools_0.1.12b/bin/vcf-indel-stats line 49.
Use of uninitialized value in concatenation (.) or string at /home/kurban/Downloads/vcftools_0.1.12b/bin/vcf-indel-stats line 49.
<: No such file or directory at /home/kurban/Downloads/vcftools_0.1.12b/bin/vcf-indel-stats line 18
    main::error('<: No such file or directory') called at /home/kurban/Downloads/vcftools_0.1.12b/bin/vcf-indel-stats line 50
    main::init_regions('HASH(0x84c998)') called at /home/kurban/Downloads/vcftools_0.1.12b/bin/vcf-indel-stats line 71
    main::do_stats('HASH(0x84c998)') called at /home/kurban/Downloads/vcftools_0.1.12b/bin/vcf-indel-stats line 9

and I have located the vcf-indel-stats before run the commend and it gives it's location:

kurban@kurban-X550VC:~/Desktop/SNPs/CD$ locate vcf-indel-stats
/home/kurban/.local/share/Trash/files/vcftools_0.1.12b/bin/vcf-indel-stats
/home/kurban/.local/share/Trash/files/vcftools_0.1.12b/perl/vcf-indel-stats
/home/kurban/.local/share/Trash/files/vcftools_0.2.1.12b/bin/vcf-indel-stats
/home/kurban/.local/share/Trash/files/vcftools_0.2.1.12b/perl/vcf-indel-stats
/home/kurban/Downloads/vcftools_0.1.12b/bin/vcf-indel-stats
/home/kurban/Downloads/vcftools_0.1.12b/perl/vcf-indel-stats

I do not know where did i go wrong, could u please give the comment i added a look and give me some corrections!

best regards

kurban

ADD REPLY
0
Entering edit mode
7.0 years ago
EagleEye 7.0k

This post might be helpful to you:

Capturing clusters having T to C mutation

You can have a summary of SNP's using this script which generate graph and also table for each VCF 4.0 (must be one VCF per sample). 

https://github.com/santhilalsubhash/TransExtract_betaV1.2 (Wiki needed to be improved).

 

 

ADD COMMENT
0
Entering edit mode
7.0 years ago

Have you had a look at Variant Effect Predictor? (http://www.ensembl.org/info/docs/tools/vep/index.html)

I have used it for SNP calling, I believe it reports indels as well. If you have genomic coordinates it will simply match these against the reference genome for your species and report differences, while predicting the effect of these variations at the mRNA/protein level, etc.

ADD COMMENT
0
Entering edit mode

thank you Natasha, i have searched related info. of the tool u have recommended from the net , it sounds like pretty good tool. but it may sound weird to u,  i am in Urumqi china. here sometimes i could not open some sites, and  connection u have provided( maybe the home page ?)also could not be viewed, i not know why but some times that happens.

best regards

ADD REPLY
0
Entering edit mode
5.3 years ago

The SNiPlay online pipeline implements VCFtools and allows to summarize statistics information from VCF file: http://sniplay.southgreen.fr/cgi-bin/analysis_v3.cgi

ADD COMMENT
0
Entering edit mode
13 months ago

if you have a large number of VCFs that you're looking to summarize, it might be worthwhile to check out our tool: https://github.com/czbiohub/cerebra

ADD COMMENT

Login before adding your answer.

Traffic: 2237 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6