4.6 years ago by

New Zealand

So, in simple terms QUAL is defined as a (scaled etc) probability measure that the site is variant for any of the samples (Note that it is therefore not a good measure for whether a particular sample in a multisample VCF is variant, for that you are probably better off with GQ).

So, say you know the probability that the site is variant for sample A from one VCF P(A), and similarly for the second sample B from the other VCF P(B), what you need for the combined VCF is P(A ∪ B) = P(A) + P(B) - P(A ∩ B)). See that extra term that isn't available? Thats the probability that the site is variant in both samples, and will vary according to the independence of A and B. If the samples are very unrelated you can possibly ignore the term, whereas if the samples are highly related (e.g. same family), it should definitely not be ignored. Variant callers like the RTG pedigree aware callers incorporate this information into their scoring, and can be quite hard to bolt on after the fact -- you'll have to make simplifying assumptions depending on your samples.