I am currently analyzing somatic SNVs in matched tumor-normal samples, and I would like to hear your opinions on filtering when it comes to variants in low-complexity (LC) regions. It was shown previously that LCs are hotspots for SNPs and especially Indels, potentially due to PCR amplification errors in long homopolymer stretches and alignment errors.
My pipeline so far used BWA mem for alignment to hg38, followed by mpileup/Varscan2, the Varscan2 false-positive filter and removal of variants annotated in the 1KG project. We downloaded our datasets without the possibility to confirm any of the variants.
Therefore I would like to ask for your opinion and experience on how reliable somatic SNPs in LCs are (LC regions obtained by using Heng Li's sdust implementation on the hg38 fasta). There are some reports out, e.g. from bcbio, who categorically exclude LC variants, but I am really wondering about the false-negative rate that one introduces. This is especially important as some of the regions (both coding and non-coding) we are interested in are (almost) fully located in LCs. Therefore I would have a hard time to categorically exclude them. In the end, we will have to confirm the interesting regions by targeted sequencing of a patient cohort, but for now I would appreciate your opinions.