Though I may not know the right answer for you, I'd like to post my opinion as an answer.
1) According to my analysis of all somatic mutations of the latest release of ICGC data (landscape of somatic mutations, raw data as supplementary provided), poisson distribution doesn't apply to somatic mutation. Based on the number of mutations in the sample, it is likely to follow normal or Weibull distribution. Indeed, it is hard to say which distribution it is exactly related to.
Let's have a look at all somatic mutations of WGS of ICGC
Min. 1st Qu. Median Mean 3rd Qu. Max.
0 1506 3668 12450 8816 722400
Choose the data between 1st quantile and 3rd quantile to find the distribution it fits. The subset of data looks like below (x axis indicates sequencial order of individual)
After fitting distribution, the evaluation is as following,
2) Based on my understanding of Bayes, your data doesn't need to follow a specific distribution. Bayes depends on prior to predict posterior so you do need to know the prior. Nowadays, there are so many excellent tools for somatic variation calling. If you are not developing novel methods, you are encouraged to use existing tools. They are reliable and widely used in the community. What's more, the category of somatic mutations contains different types, eg. substitution, indel, structural variation, etc. Are you going to detect all types? It seems you are going to call somatic mutations based on statistics information. My understanding is that current tools map cancer sequences and reference genome.
modified 2.7 years ago
2.7 years ago by
solo7773 • 70