Question: How is QUAL score calculated in multigenome VCF file?
gravatar for MAPK
4.6 years ago by
MAPK1.7k wrote:

Suppose I have two VCF files with one sample in each and want to merge them using python scripts. I want to know if the QUAL score in these files should be summed up or taken as an averaged for the final QUAL score. Can someone please explain me how is this score differ when merging two different files with different number of samples as well?

vcf • 1.7k views
ADD COMMENTlink modified 4.6 years ago by Len Trigg1.5k • written 4.6 years ago by MAPK1.7k
gravatar for Len Trigg
4.6 years ago by
Len Trigg1.5k
New Zealand
Len Trigg1.5k wrote:

So, in simple terms QUAL is defined as a (scaled etc) probability measure that the site is variant for any of the samples (Note that it is therefore not a good measure for whether a particular sample in a multisample VCF is variant, for that you are probably better off with GQ).

So, say you know the probability that the site is variant for sample A from one VCF P(A), and similarly for the second sample B from the other VCF P(B), what you need for the combined VCF is P(A ∪ B) = P(A) + P(B) - P(A ∩ B)). See that extra term that isn't available? Thats the probability that the site is variant in both samples, and will vary according to the independence of A and B. If the samples are very unrelated you can possibly ignore the term, whereas if the samples are highly related (e.g. same family), it should definitely not be ignored. Variant callers like the RTG pedigree aware callers incorporate this information into their scoring, and can be quite hard to bolt on after the fact -- you'll have to make simplifying assumptions depending on your samples.

ADD COMMENTlink written 4.6 years ago by Len Trigg1.5k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1130 users visited in the last hour