Distribution underlying VCF QUAL - all sites or just those in VCF?
1
0
Entering edit mode
7.1 years ago

I'm wondering about the distribution which underlies a VCF QUAL score. Specifically, I understand that

"A site with QUAL=20 has a 99% chance of being a true mutation," or more precisely (I think)

"The collection of sites with QUAL=20 has the property that 99% of it consists of true mutations."

My question is, does "the collection of sites with QUAL=20" apply to just the sites listed in the VCF, or to all the sites sequenced in the experiment? This actually affects the definition of QUAL, I think, and if the latter it might mean we could expect millions of false positives in whole-genome variant-calls.

Thanks for any help! And sorry if I'm overlooking something, but I haven't found any documentation that addresses this point in a clear way.

snp genome • 1.3k views
ADD COMMENT
0
Entering edit mode
7.1 years ago

Actually, perhaps a decent sample space would be, all the distinct calls made by the variant caller (in this case mpileup) by everyone in all of history. That is, 99% of the sites ever observed in history with QUAL=20 should be true mutations. A more formal definition would perhaps involve all possible return states of the variant caller...

ADD COMMENT

Login before adding your answer.

Traffic: 2967 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6