How to extract Homozygote variants froma VCF format?
4
0
Entering edit mode
6.9 years ago

I am doing SNP analysis on whole genome saccharomyces cerevisiae. I want to segregate the homozygote variants from the heterozygote variants. How do I go about it?

SNP Homozygotes • 3.0k views
ADD COMMENT
1
Entering edit mode
6.9 years ago
Gabriel R. ★ 2.8k

Look at the GT field for the most likely genotype.

ADD COMMENT
0
Entering edit mode
6.9 years ago
Vivek ★ 2.5k

For a diploid without multi-allelic loci

grep "^#\|1\/1" snps.vcf > hom-alt.vcf

ADD COMMENT
0
Entering edit mode

I tried this, but I did find "0/1" still present in the vcf.

ADD REPLY
0
Entering edit mode

show us your cmd-line. Is there only one samples  ? can you find the string '1/1' elsewhere ?

ADD REPLY
0
Entering edit mode
6.9 years ago

using my tool vcffilterjs https://github.com/lindenb/jvarkit/wiki/VCFFilterJS

 java -jar vcffilterjs.jar -e 'variant.getGenotype("SAMPLENAME").isHet()' your.vcf.gz > het.vcf

or use isHom() or isHomVar() or isHomRef()

 

or use GATK variant filtration: http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_filters_VariantFiltration.html

 

ADD COMMENT
0
Entering edit mode
6.9 years ago
Yahan ▴ 390

snpSift is a useful tool if you want to look at a multisample vcf where all samples should be homozygote (or other types of queries):

for instance for four samples:

java -jar SnpSift.jar filter "countHom() = 4 & !(GEN[0].GT='./.') & !(GEN[1].GT='./.') & !(GEN[2].GT='./.')
& !(GEN[3].GT='./.') -f my.vcf

It's unfortunate that it considers a no call (./.) also as a homozygote, but the !(GEN[0].GT='./.') ... part deals with that.

 

ADD COMMENT

Login before adding your answer.

Traffic: 2130 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6