Hi everyone! I'm a very very new to these variant formats. So please, kindly help me out. I have a SNP vcf file. And I need the hom and het variants. But I dont want to use the 012 output format of vcftools, its just too much of hassle. So I was thinking of using bcftools/samtools. I was thinking of using the FQ tag for this task. FQ : Consensus quality. If positive, FQ equals the phred-scaled probability of there being two or more different alleles. If negative, FQ equals the minus phred-scaled probability of all chromosomes being identical. Notably, given one sample, FQ is positive at hets and negative at homs. But my vcf file does not contain FQ tags in the INFO column. So I'm guessing I'd have to use vcf2fq command to get FQ. How do I go about issuing this command? Please help.
I am not sure if I understand your problem, but wouldn't just looking at AF enough?
In your VCF file in 8th column, AF means allele frequency for each ALT allele in the same order as listed. So essentially AF=0.5 would be heterozygous site.
vcf2fq command is a part of vcfutils.pl program and is used to generate consensus sequence for a given diploid organism.
vcfutils.pl vcf2fq input.vcf > consensus.fq
But depending on your need I think using the GT tag in the vcf file should be the right approach. For a diploid organisms, you will see on of these values: 1/1, 0/1 or 0/0 in each row.
1/1 represents homozygous SNP. In other words, both the chromosomes in a homologous pair show different nucleotides.
0/1 represents heterozygous SNP. In other words, one chromosome from the pair of homologous chromosomes shows different nucleotide and the other chromosome has the same nucleotide as reference.
0/0 represents homozygous reference represents "no SNP" as each chromosome in a homologous pair contains the same nucleotide as reference.
All you need to do is grep these rows from your vcf file. For example for case 1/1,
egrep "^#|1/1:" Input.vcf > Homozygous_SNP.vcf