Hello everybody.
I am trying to find out how to identify if a snp is heterozygous or homozygous from the flags of a vcf file.
Maybe this is a trivial question but even if I read the vcf documentation I am still confused.
I am giving the following examples:
I have the following call from freebayes:
chr24 51 . C G 1458.24 . GT:DP:RO:QR:AO:QA:GL 1/1:50:0:0:50:1802:-5,-5,0
Based on a post that I read in SEQanswers by using the genotype likelihoods (PL tag in samtools vcf, GL in freebayes vcf) we have:
P(D|CC) = 10^(-5) = 0.00001
P(D|CG) = 10^(-5) = 0.00001
P(D|GG) = 10^(0) = 1
So, in that case can we say that the genotype is homozygous (GG) in my individual?
Can we say that in position 51 of the chromosome 24, the base in the reference sequence is C and the base in my individual is G which is homozygous?
Another example:
chr24 55172 . G T 273.651 . GT:DP:RO:QR:AO:QA:GL 0/1:14:4:143:10:362:-5,0,-5
Here we have:
P(D|GG) = 10^(-5) = 0.00001
P(D|GT) = 10^(0) = 1
P(D|TT) = 10^(-5) = 0.00001
So, in that case can we say that the genotype is heterozygous (GT) in my individual?
And what exactly does it means?
Is that in position 55172 of the chromosome 24, the base in the reference sequence is G and the base in my individual is T but that T is heterozygous which is (G/T), or the REF and the ALT are the two zygotes of the SNP?
I am really sorry if my description is a little confusing.
Thank you very much in advance.
Why not just trusting the GT field?
Do you mean 0/1 is a heterozygous and 1/1 homozygous?
I just show a case that I have GT: 1/1 but GL: -5,0, -2.54.
Of course the RO and the AO are too low (5 and 2 respectively. I have only 7 depth in this position).
Either I trust GT or GL or even the RO and AO, I cannot interpret the results. I mean, what exactly means heterozygosity in vcf? That the REF and the ALT are referred to the different zygotes of the SNP, or the REF is the reference base?
yes
Thank you very much, but could you explain me (if its possible) what exactly does it means a heterozygous snp in a vcf file? How the record of a heterozygous snp is interpreted?
The REF and ALT represents the two phases of zygosity of the snp or the REF is the reference base and the ALT is the base in my sequence without any other information of the zygosity?
Thank you very much and I am sorry for being so annoying, but this information is very valuable to me.