Question: Count Of Variants
6.3 years ago by
win790 wrote:

hi there, is there a way to get count of SNP, indels, CNVs etc from a VCF file, so some thing like

SNPs = ?

Insertions = ?

Deletions = ?

CNVs = ?

using simple linux commands

thanks, a

vcf • 8.8k views
ADD COMMENTlink modified 4.0 years ago by Jorge Amigo11k • written 6.3 years ago by win790
6.3 years ago by
Boston, United States
matted6.9k wrote:

There are a couple of ways that variant type is annotated within a VCF file, so there are correspondingly a few ways to get close to what you want. Here's one choice that should work with most VCF files:

Use the vcftools tool vcf-annotate to fill in the variant type field:

zcat in.vcf.gz | vcftools_0.1.9/bin/vcf-annotate --fill-type > out.vcf

Then count up the variants by looking at the (newly-filled) TYPE field:

grep -oP "TYPE=\w+" out.vcf | sort | uniq -c

Or in one step that doesn't change the original VCF file:

zcat in.vcf.gz | vcftools_0.1.9/bin/vcf-annotate --fill-type | grep -oP "TYPE=\w+" | sort | uniq -c

On an example I had, this yielded:

3410 TYPE=del
4487 TYPE=ins
56744 TYPE=snp

1000 Genomes VCF files will be annotated in a finer-grained way (e.g. choices including DUP, INV, CNV, TANDEM, see here), but I'm not sure how to get their range of annotations from your own raw read data. However, if these distinctions are critical to you, that may be a useful direction to explore.

ADD COMMENTlink modified 10 weeks ago by zx87545.6k • written 6.3 years ago by matted6.9k

This is so helpful! Thank you!

ADD REPLYlink written 4 weeks ago by kelseyca0
4.0 years ago by
Jorge Amigo11k
Santiago de Compostela, Spain
Jorge Amigo11k wrote:

bcftools has a reporting tool that gives you this kind of information:

bcftools stats file.vcf > file.stats
ADD COMMENTlink written 4.0 years ago by Jorge Amigo11k

this doesn't seem to differentiate insertions or deletions. just indels.

ADD REPLYlink written 3.3 years ago by nchuang180
