Hey, didn't you have another account? - your profile photo looks familiar.
The only way to confirm a variant is through the use of an ancillary method, like Sanger sequencing. NGS always struggles to correctly call variants in repeat regions whose length approaches the average read length that you're using. Why? - in part, it is due to the issue of mis-alignment in these regions. Even prior to in silico alignment, homopolymers like AAAAAAA, GGGGGGGG, etc., can be difficult to faithfully sequenced during the sequence run itself.
To guard against errors in repeat regions, you can do some basic QC thresholds:
- Prior to alignment, trim bases at read ends whose average base qualities fall below 30
- Prior to alignment, eliminate short reads
- Prior to variant calling, eliminate reads with MAPQ<40, 50, or 60
- Require that variants are called at minimum of 18 read depth
- Require that variants have 'high' genotype qualities (at least 30)
- Only look at variants that pass a threshold for strand bias (given by PV4 tag)