There's a second quality score in the file as well, usually.  If your file has a column like this "GT:PL:GQ", then there's another column that looks something like this "1/1:255,255,0:99".  That third score is also a quality score, it scales from 1-99.
The GQ is defined as "a phred quality -10log_10p(genotype call is wrong).  So again, a high number means that the SNP is likely to be real.
Which of the two is better to use, that I don't know.  The QUAL score behaves differently if you have multiple .bams in your vcf file, while each .bam file will have its own GT:PL:GQ at each putative polymorphic locus.
I've done just a bit of sanger confirmation SNPs called in an exome capture projects, and, for what it's worth, I found that most of the entries that were called a homozygous SNP in the vcf, even with poor GQ's, turned out to be real.  But again, with exome capture, I was expecting a good number of SNPs, and I was counting SNPs that were off target, and therefore expected to be low coverage.  On a sample where the whole sample is expected to be well covered, like a whole microbial genome, the low-quality SNPs might be less likely to be real.
I'd also consider looking at the DP4.  Coverage in some cases is a good proxy for quality.
                    
                
                 
Page about interpreting phred: https://gatk.broadinstitute.org/hc/en-us/articles/360035531872-Phred-scaled-quality-scores