Question

Filtering on the minor allele in VCFtools

0

Entering edit mode

8.4 years ago

outlier95 ▴ 30

Wondering how I can get the number of informative sites in a .vcf file using VCFtools. By informative I mean at least two samples share a variant. Any suggestions? Thanks.

vcftools snps • 4.9k views

ADD COMMENT • link updated 8.4 years ago by Adam ★ 1.0k • written 8.4 years ago by outlier95 ▴ 30

0

Entering edit mode

Not sure about VCFtools, but if you are up for trying something new, the Variant Effect Predictor gives you information on MAF for data in .vcf files. Look at the Filtering options available for the VEP including frequency i.e. MAF.

ADD REPLY • link 8.4 years ago by Denise CS ★ 5.2k

0

Entering edit mode

GATK indicates they do this for their best practices and then makes the reader scavenger around the internet to find how to do this instead of giving a resource. Shame.

ADD REPLY • link 3.8 years ago by bjwiley23 ▴ 40

score 0 · Answer 1 · 2016-06-11

using vcffilterjs: https://github.com/lindenb/jvarkit/wiki/VCFFilterJS add INFORMATIVE in the FILTER column for the variant having less than two samples having more than one genotype hom-ref or het. extract the FILTER column, count the number of line containing INFORMATIVE

cat input.vcf |\
java -jar dist/vcffilterjs.jar -F INFORMATIVE -e 'function accept(v) { var f=0,i;for(i=0;i<v.getNSamples();++i) {var g=v.getGenotype(i); f+=(g.isHomVar() || g.isHet()?1:0);} return f<2;}accept(variant);' |\
grep -v "^#" | cut -f 7 | grep -c INFORMATIVE

score 0 · Answer 2 · 2016-06-11

0

Entering edit mode

8.4 years ago

Adam ★ 1.0k

vcftools --gzvcf vcf_file --mac 2 --stdout --recode | fgrep -v '#' | wc -l

ADD COMMENT • link 8.4 years ago by Adam ★ 1.0k

0

Entering edit mode

MOD-EDIT: OP has opened a new question for this here: Identifying private and shared SNPs using VCFtools

ADD REPLY • link 8.4 years ago by outlier95 ▴ 30

0

Entering edit mode

If I remove singletons and private doubletons via --singletons and --positions using VCFtools, then take the difference in the number of SNPs before and after filtering, should that amount to the number of informative SNPs (per my definition)? Many thanks.

ADD REPLY • link 8.4 years ago by outlier95 ▴ 30