hello,
I want to extract the genotype of the germline variant. As it is a human sample, it is expected to be diploid (2n).
In the example below, three variants were observed at one SNP site, and an allele with a high number of reads among the three variants was selected, and sample#1 shows 0/2 and sample#2 shows a 1/2 genotype.
However, we want to use only alleles with AD > 100, and we want to proceed with genotype calling for alleles with AD > 100. As a result, the genotypes of sample#1 and #2 we want are both 2/2. (When AD<100 was simply applied, it was called 0/0. It seems that additional filtering is needed.)
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample#1 sample#2
rs00001 174 . A T,G 46256.89 PASS AC=1,2;AF=0.250,0.500;AN=4;BaseQRankSum=3.60;DP=22344;ExcessHet=3.0103;FS=0.000;MLEAC=1,2;MLEAF=0.250,0.500;MQ=60.00;MQRankSum=14.12;QD=14.52;ReadPosRankSum=4.62;SOR=1.350 GT:AD:DP:GQ:PL 0/2:7,4,2786:2808:22:38023,37997,38116,0,76,22 1/2:1,10,377:391:99:8253,8071,8052,195,0,482
Additionally, after applying AD> 100 filter, we want to consider the balance between heterozygous alleles (AB).
If AD().1/DP > 0.7, homozygous ref allele is called
If AD().1/DP < 0.3, homozygous alt allele is called
If 0.3 <AD().1/DP <0.7, both ref and alt alleles are called (heterozygous)
In the example below, genotype of both sample #1 and #2 should be called as 1/1.
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample#1 sample#2
rs00002 149 . G A 176718.50 PASS AC=2;AF=0.500;AN=4;BaseQRankSum=1.76;DP=10932;ExcessHet=4.7712;FS=0.000;MLEAC=2;MLEAF=0.500;MQ=60.00;MQRankSum=3.000e-03;QD=16.18;ReadPosRankSum=4.69;SOR=0.693 GT:AD:DP:GQ:PL 0/1:1182,4788:5974:99:100100,0,11535 0/1:1263,3687:4950:99:76628,0,17056
I have considered several things, but I am asking a question because it is difficult to apply a filter. Besides variantfiltration, you can use other tools (selectvariant, vcffilter, bcftools, snpsift). Thank you for reading and ask for an answer if anyone knows the answer.