Estimating the expected number of reads that overlap a [heterozygous] snp in one window from the read depth, read length and resolution
Entering edit mode
8.7 years ago
Dataman ▴ 370


I am writing a Python program to simulate the cancer genome with different events such as deletion, amplification, copy neutral LOH, ...

The simulator has two parts: simulation of the read depth (this part is already implemented) and simulation of beta allele fraction (BAF) (under development).

To develop the second part, I am going to use the binomial distribution, binom(n, p), where parameter 'n' is the expected number of reads that overlap a snp and parameter 'p' is calculated based on the events. For instance, if there is no aberration, I expect p to be 0.5 (0.5 for seeing the reference allele and 0.5 for seeing the alternative (b) allele) etc. The output of the binomial distribution is then the number of b-allele at each position.

My question is about estimating the parameter 'n' of the binomial distribution from the simulated read depth track (part one of the program). For instance, if we assume that the resolution of the read depth track is 1000 bp (which is the same as step in a .wig file with fixed step) and the read_length is 150 bp and if we see 400 reads (in one window of length 1000 bp.), how many of those reads overlap one [heterozygous] snp in that window?

I would like to thank you in advance for your suggestions on how to approach this question. Any idea is much appreciated.

Here is how two tracks of real data look like. First is the read depth and the second is the mirrored BAF track:

IGV representation of read depth and mBAF tracks

  • binomial distribution is used to add some noise to the data.
statistics allele simualtion snp baf • 2.8k views
Entering edit mode
8.7 years ago
Dataman ▴ 370

Here is my own answer:

n = read_length * (read_depth / (read_length + resolution) )

For instance if we have:

  • read_length = 100
  • resolution = 1000
  • read_depth at the current window = 400


n = 100 * (400 / 1100) ≈ 40

Here is a snapshot of the results of the simulation:

The result of simulation


Login before adding your answer.

Traffic: 1659 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6