I am trying to filter a vcf file Info column. I have some Info Fields with values "DP=.". When I try to use GATK VariationFilter tool with option -filter "DP>50", it throws error: java.lang.NumberFormatException for input string ".". This is the case with many other values in INFO Column in vcf file. I also tried using -filter "DP!='.' && DP>50". How can I overcome this issue?
The GATK is particularly sensitive to type, and you can get weird results if you try to mix-and-match among them. Perhaps the more relevant question is why your DP fields are ''. If you don't have a depth of at least 1, how do you have any evidence for that site to potentially harbor a variant in the first place?
A quick workaround that should cause GATK to exit cleanly would be to replace these strings with DP=0, and then your filter should work. Something like sed is your friend here and will be quite fast.