Why Are The Samtools/Bcftools Pv4 T-Tests One Sided?
1
3
Entering edit mode
11.5 years ago
Casbon ★ 3.2k

From the mpileup page, we have the definition:

PV4: P-values for 1) strand bias (exact test); 2) baseQ bias (t-test); 3) mapQ bias (t); 4) tail distance bias (t)

Looking at the source for this t-test (I couldn't find any further documentation), we can can see on line 61:

if (u1 <= u2) return 1.;


At this point, u1 and u2 are the mean values of interest. So this t-test returns one if u2 is larger than u1. So, for example, if we are considering mapping quality we return one (accept null hypothesis that sample means are the same). This means we only test if the mapping quality is lower in the non reference reads.

Why not use a two sided t-test to test for differences in means between quantities of interest?

vcf statistics mpileup bcftools • 3.7k views
2
Entering edit mode
11.5 years ago
lh3 33k

Reads with fewer mismatches to the reference are mapped better. baseQ is BAQ adjusted. Mismatching bases tend to have lower BAQ, too. The same is true for the distance to the end of a read.

Practically, one-tail and two-tail tests have negligible effect on the final SNPs.