Question: Prior probability during SNP calling
gravatar for CY
11 months ago by
United States
CY290 wrote:

Most SNP caller, such as HaplotypeCaller apply bayes method to call SNP. HaplotypeCaller use 1KG / dbsnp data set as prior probability.

This got me thinking what prior does 1KG use. It turns out that 1KG set 0.001/base as prior. This seems reasonable given that the data before 1KG shows the average SNP occurrence is 0.001.

Then the question remains. What did projects before 1KG use as prior? Or those early project did not use bayes? Then what methods did they use?

snp • 331 views
ADD COMMENTlink modified 11 months ago • written 11 months ago by CY290

Prior to the HaplotypeCaller, everyone was using UnifiedGenotyper with the GATK, which obviously behaves in a different way than HaplotypeCaller. This was back when the only large population-based dataset that was available was the International HapMap 270. I even ordered this on multiple CDs, still sitting in an office in the UK.

Back then, I don't recall many other variant callers. SAMtools was certainly around.

Your interest in the prior probability matches my own interest in it, but I have not done much work on this particular area. I believe, nevertheless, that the prior probabilities going into each variant call are strongly biased by read-depth and that these probabilities are also responsible for the clear cut variants that are sometimes missed by the GATK. This is why I believe the GATK need to do more work on the influence of downsampling (read depth) and how this affects variant calling. At the moment, as far as I am aware, all variant callers 'randomly' downsample to 1000 or 500 read depth without thinking about how this may affect calling.

ADD REPLYlink written 11 months ago by Kevin Blighe39k

I believe that UG is the same as HC in "using population frequency from previous study as prior".

In terms of downsampling, seems reasonable for SNP calling. However, I saw Mutect (it still uses HC) also set default maximum coverage of 1000 during somatic calling. It does not make sense to me. Many somatic calling projects have sequencing depth above 1000. Such parameter can detect any low frequency somatic variant.

ADD REPLYlink modified 11 months ago • written 11 months ago by CY290

I can see a reply in my feed but not here (?) - You can read more on what I mean about read depth here: C: Lack of consensus between NGS & Sanger sequencing on indels/mutations

ADD REPLYlink modified 11 months ago • written 11 months ago by Kevin Blighe39k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1179 users visited in the last hour