I am doing SNP analysis on whole genome saccharomyces cerevisiae. I want to segregate the homozygote variants from the heterozygote variants. How do I go about it?
Look at the GT field for the most likely genotype.
For a diploid without multi-allelic loci
grep "^#\|1\/1" snps.vcf > hom-alt.vcf
I tried this, but I did find "0/1" still present in the vcf.
show us your cmd-line. Is there only one samples ? can you find the string '1/1' elsewhere ?
using my tool vcffilterjs https://github.com/lindenb/jvarkit/wiki/VCFFilterJS
java -jar vcffilterjs.jar -e 'variant.getGenotype("SAMPLENAME").isHet()' your.vcf.gz > het.vcf
or use isHom() or isHomVar() or isHomRef()
or use GATK variant filtration: http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_filters_VariantFiltration.html
snpSift is a useful tool if you want to look at a multisample vcf where all samples should be homozygote (or other types of queries):
for instance for four samples:
java -jar SnpSift.jar filter "countHom() = 4 & !(GEN.GT='./.') & !(GEN.GT='./.') & !(GEN.GT='./.')
& !(GEN.GT='./.') -f my.vcf
It's unfortunate that it considers a no call (./.) also as a homozygote, but the !(GEN.GT='./.') ... part deals with that.
Login before adding your answer.
Use of this site constitutes acceptance of our User Agreement and Privacy