Take one VCF file for example,
##fileformat=VCFv4.1
##samtoolsVersion=0.1.16 (r963:234)
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##INFO=<ID=AF1,Number=1,Type=Float,Description="Max-likelihood estimate of the site allele frequency of the first ALT allele">
....
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SLVi33.16 SLVi33.25 SLVi33.26
chr1 85691 . C T 6.26 . DP=2;AF1=1;CI95=0.3333,1;DP4=0,0,2,0;MQ=40;FQ=-28.9 GT:PL:GQ 0/1:0,0,0:3 1/1:20,3,0:4 1/1:18,3,0:4
chr1 88338 . G A 6.95 . DP=3;AF1=1;CI95=0.1667,1;DP4=0,0,2,1;MQ=30;FQ=-28.1 GT:PL:GQ 0/1:0,0,0:3 0/1:0,0,0:3 1/1:38,9,0:8
The DP=2 at chr1:85691, however, there are three individuals in the file, why does every individual has genotype? As AF1 means allele frequency of ALT allele, it supposed to be AF1=5/6 at chr1:85691, why it's truely AF1=1? I'm so confused, anybody can help me? Thanks in advance.
Thanks so much. You've got the point. When I add flag
-D
in samtools to call the variants, it keep read depth for each individual. What I get for chr1:85691 is,That's exactly what you said
So if I want to extract the genotype from VCF generated by samtools, I must be very careful about the DP, I should mark the genotype as missing data for samples with DP=0.
If DP of 2 means 2 reads out of 3 individuals, for example, if the 2 reads come from SLVi33.16 and SLVi33.25 instead of SLVi33.26, the genotype symbol for SLVi33.26 should be "./.". But in the file, the genotype symbols are 0/1, 1/1 and 1/1, which means the genotype for these 3 individuals is CT, TT and TT(T5+C1), respectively (and that why I said before AF1=5/6). In other word,we know the genotypes for 3 individuals, the DP should be greater than 6, but it's DP=2 here. How to explain this discordance? Thanks
Yeah I see what you mean. I think this maybe an error on samtools part. Samtools is pretty bad at outputting the genotype information. The fact that the PL fields show 0,0,0 pretty much means there is no read covering it.
Thanks so much. You've got the point. When I add flag
-D
in samtools to call the variants, it keep read depth for each individual. What I get forchr1:85691
is,That's exactly what you said