Question: Estimating the expected number of reads that overlap a [heterozygous] snp in one window from the read depth, read length and resolution
1
gravatar for Dataman
5.2 years ago by
Dataman310
Finland
Dataman310 wrote:

Hi,

I am writing a Python program to simulate the cancer genome with different events such as deletion, amplification, copy neutral LOH, ...

The simulator has two parts: simulation of the read depth (this part is already implemented) and simulation of beta allele fraction (BAF) (under development).

To develop the second part, I am going to use the binomial distribution, `binom(n, p)`, where parameter 'n' is the expected number of reads that overlap a snp and  parameter 'p' is calculated based on the events. For instance, if there is no aberration, I expect p to be 0.5 (0.5 for seeing the reference allele and 0.5 for seeing the alternative (b) allele) etc. The output of the binomial distribution is then the number of b-allele at each position.

My question is about estimating the parameter 'n' of the binomial distribution from the simulated read depth track (part one of the program). For instance, if we assume that the resolution of the read depth track is 1000 bp (which is the same as step in a .wig file with fixed step) and the read_length is 150 bp and if we see 400 reads (in one window of length 1000 bp.), how many of those reads overlap one [heterozygous] snp in that window?

I would like to thank you in advance for your suggestions on how to approach this question. Any idea is much appreciated.

Here is how two tracks of real data look like. First is the read depth and the second is the mirrored BAF track:

IGV representation of read depth and mBAF tracks

* binomial distribution is used to add some noise to the data.

ADD COMMENTlink modified 5.2 years ago • written 5.2 years ago by Dataman310
1
gravatar for Dataman
5.2 years ago by
Dataman310
Finland
Dataman310 wrote:

Here is my own answer:

n = read_length * (read_depth / (read_length + resolution) )

For instance if we have:
read_length = 100
resolution = 1000
read_depth at the current window = 400

Then:

n = 100 * (400 / 1100) ≈ 40

 

Here is a snapshot of the results of the simulation:

The result of simulation

 

ADD COMMENTlink modified 5.2 years ago • written 5.2 years ago by Dataman310
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1840 users visited in the last hour