Could anyone attempt to walk me through the logic of what is happening here?
Number of alts indicates it could be an autosomal false positive.
Potential polymorphic NuMT annotation compares the number of alts to the autosomal coverage Polymorphic NuMTs (NuMT sequence that is not in the reference) will result in autosomal reads that map to the mitochondria. This annotation notes when the number of alt reads falls within 80% of the autosomal coverage distribution, assuming that autosomal coverage is a Poisson distribution given a central statistic of the depth (median is recommended). This will also include true low allele fraction mitochondrial variants and should be used as an annotation, rather than a filter.
Caveat: This annotation can only be calculated in Mutect2 if median-autosomal-coverage argument is provided.
For context, about 90% of the mitochondrial genome is susceptible to insertion into the nuclear genome. These insertion events occur throughout an individual's lifespan with regular frequency (usually in DSB repair), and have been occurring for basically the entirety of human evolution. Unless you've taken laboratory steps to isolate mtDNA from the nuclear genome, NuMTs can skew heteroplasmy allele fractions when you do mitochondrial variant calling. This function apparently helps filter out NuMTs at the analysis stage.
Anyway, I am really having trouble understanding the algorithm, specifically the logic of autosomal coverage being a Poisson distribution around median autosomal coverage. Is total genome length substituted as the time interval, and coverage at a specific position as the event occurrence?
Also, would this annotation only apply to whole genome sequencing? In whole exome, your coverage isn't constant by design, so I assume it would violate the key Poisson assumption that "average event occurrence rate is constant over interval".