Should we trust genotypes called in simple tandem repeat regions?
1
0
Entering edit mode
2.5 years ago
samuelandjw ▴ 250

Hello. I am searching genomes (WGS) or exomes (WES) of patients with rare diseases for potential disease-causing variants. The accuracy of each genotype for each patient is vital. I'm using GATK 4 to perform joint-calling of genotypes of the patient cohort. I filter out genotypes with low DP and low GQ (by setting genotypes to missing). I noticed that some called genotypes were located in simple tandem repeat regions (according to repeat masker).

Given that NGS is not going well with repeat regions in the genomes, genotypes called in those regions are supposed to be with lower quality, but I do not know what to do with them? Some sources suggest filtering out in-frame indels called in those regions but retaining the rest. How about SNPs and frameshift indels? What will be the rationale for retaining or filtering certain types of variants in simple tandem repeat regions?

sequencing GATK WGS WES • 576 views
ADD COMMENT
1
Entering edit mode
2.5 years ago

The problem in low complexity regions is that the alignments themselves may fundamentally incorrect, thus it can be extremely challenging to determine which variant is present from short reads alone.

In the paper that you cite they state:

In all 35 cases, the single nucleotide variant (SNV) was confirmed by Sanger sequencing.

In the end, perhaps that is the only way to know for sure, particularly when you are detecting a novel variant that falls into a low complexity region.

ADD COMMENT

Login before adding your answer.

Traffic: 1503 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6