extracting heterozygous snp from a vcf file
1
3
Entering edit mode
5.9 years ago
skdutta2091 ▴ 40

Dear friends,

I am very new to next gen sequencing. Using mplieup and bcftools in samtools I have generated a vcf file. I want to write a perl script to extract only the heterozygous snp. How can I understand which are heterozygous snps and how they can be extracted through a perl or any programming script.

I will be very grateful if anybody can shed some lights towards the doubt please.

Regards,

Shubhankar

next-gen SNP • 5.0k views
ADD COMMENT
5
Entering edit mode
5.9 years ago
always_learning ★ 1.1k

GT tag with 0/1 is Heterozygous. You dont need to even write a script for that.

vcftools --gzvcf file.vcf.gz --extract-FORMAT-info GT | grep "0/1"

will work

ADD COMMENT
0
Entering edit mode

Dear Syed, I have paired end reads. Will it work for that too?

ADD REPLY
0
Entering edit mode

It won't work if you have multiple alleles per position (you'll have genotypes like 0/2 then). I recommend using vcfbreakmulti from vcflib or LeftAlignAndTrimVariants from GATK with option --splitMultiallelics before using this script.

ADD REPLY

Login before adding your answer.

Traffic: 2070 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6