How to extract Homozygote variants froma VCF format?
4
0
Entering edit mode
9.9 years ago
Parimala Devi ▴ 100

I am doing SNP analysis on whole genome saccharomyces cerevisiae. I want to segregate the homozygote variants from the heterozygote variants. How do I go about it?

SNP Homozygotes • 4.5k views
ADD COMMENT
1
Entering edit mode
9.9 years ago
Gabriel R. ★ 2.9k

Look at the GT field for the most likely genotype.

ADD COMMENT
0
Entering edit mode
9.9 years ago
Vivek ★ 2.7k

For a diploid without multi-allelic loci

grep "^#\|1\/1" snps.vcf > hom-alt.vcf

ADD COMMENT
0
Entering edit mode

I tried this, but I did find "0/1" still present in the vcf.

ADD REPLY
0
Entering edit mode

Show us your cmd-line. Is there only one sample? Can you find the string 1/1 elsewhere?

ADD REPLY
0
Entering edit mode
9.9 years ago

using my tool vcffilterjs

java -jar vcffilterjs.jar -e 'variant.getGenotype("SAMPLENAME").isHet()' your.vcf.gz > het.vcf

or use isHom() or isHomVar() or isHomRef()

or use GATK variant filtration

ADD COMMENT
0
Entering edit mode
9.9 years ago
Yahan ▴ 400

snpSift is a useful tool if you want to look at a multisample vcf where all samples should be homozygote (or other types of queries):

for instance for four samples:

java -jar SnpSift.jar filter "countHom() = 4 & !(GEN[0].GT='./.') & !(GEN[1].GT='./.') & !(GEN[2].GT='./.')

& !(GEN[3].GT='./.')" -f my.vcf

It's unfortunate that it considers a no call (./.) also as a homozygote, but the !(GEN[0].GT='./.') ... part deals with that.

ADD COMMENT

Login before adding your answer.

Traffic: 3185 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6