Question: extracting heterozygous snp from a vcf file
3
gravatar for skdutta2091
5.3 years ago by
skdutta209140
United States
skdutta209140 wrote:

Dear friends,

I am very new to next gen sequencing. Using mplieup and bcftools in samtools I have generated a vcf file. I want to write a perl script to extract only the heterozygous snp. How can I understand which are heterozygous snps and how they can be extracted through a perl or any programming script.

I will be very grateful if anybody can shed some lights towards the doubt please.

Regards,

Shubhankar

snp next-gen • 4.6k views
ADD COMMENTlink modified 5.3 years ago by always_learning1.1k • written 5.3 years ago by skdutta209140
4
gravatar for always_learning
5.3 years ago by
always_learning1.1k
Doha, Qatar
always_learning1.1k wrote:

GT tag with 0/1 is Heterozygous. You dont need to even write a script for that.

vcftools --gzvcf file.vcf.gz --extract-FORMAT-info GT | grep "0/1"

will work

ADD COMMENTlink modified 15 months ago by Ram32k • written 5.3 years ago by always_learning1.1k

Dear Syed, I have paired end reads. Will it work for that too?

ADD REPLYlink written 5.3 years ago by skdutta209140

It won't work if you have multiple alleles per position (you'll have genotypes like 0/2 then). I recommend using vcfbreakmulti from vcflib or LeftAlignAndTrimVariants from GATK with option --splitMultiallelics before using this script.

ADD REPLYlink modified 15 months ago by Ram32k • written 5.3 years ago by mkulecka320
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1500 users visited in the last hour
_