I have a multisample vcffile and would like to edit heterozygous individuals with a low or high allelic balance (AB). AB is defined as the ratio of reads showing the reference allele to all reads.
If the AB is below 0.25, I would like to change the genotype to 1/1 (homozygous for alternateallele ). If the AB is above 0.75, I would like to change the genotype to 0/0 (homozygous for reference allele).
Unfortunately, I dont have the AB information for each individual in my vcf. Therefore, I was planning to use RO and DP instead. RO is the count of full observations of the reference haplotype and DP is the total number of reads. These tags are specified in the genotype column, meaning that they are specific for every individual.
The way to proceed in these cases apparently is as described here: https://software.broadinstitute.org/gatk/documentation/article?id=12350
I have been trying to use something similar to this to tag the variants that I want to convert:
gatk VariantFiltration -R /ref.fasta -V input.vcf -O output.vcf --genotype-filter-expression 'isHet == 1 && RO / DP < 0.25 ' --genotype-filter-name 'lowAB'
But somehow this just adds an anotation to all heterozygous positions...
As far as I know, GATK only allows you to convert the calls with the tag to no call. Therefore I think I need a script to manually change the genotype fields if the tags are present.