Hi I'm taking a class and we have a homework problem that references shotgun sequencing and I'm stuck.. I'm not asking for the answer. I just want some clarification.. Here's the problem:
Consider this simplified model of whole-genome shotgun sequencing:
- many copies of a genome of length G are broken at many points uniformly at random (the probability of breakage at each location in the genome is uniform over the locations)
- the location of each fragment so produced is independent of the locations of the other fragments
- edge effects can be ignored
- the fragments are cloned into vectors and this works perfectly
- from each cloned fragment, exactly L nucleotides are sequenced from one end of the fragment, yielding a total of R sequence reads of length L (where L is much, much smaller than G)**
a) The coverage C of a shotgun sequencing is the expected number of times each nucleotide in the genome has been sequenced during the procedure. Provide an expression for C in terms of other defined quantities.
Answer: C= RL/G
This is the part I'm having trouble with
b) What is the probability that a specic location in the genome will not be covered by any of the R reads (your answer must be a function of R and C)? Using this probability, write an expression for the expected number of nucleotides in the genome that remain unsequenced during this procedure (your answer must be a function of G and C).
Answer: From wikipedia, I obtained that the probability of not covering a given location on the target for N fragments is: [1- C/R]^R or e^-C. How do I approach part 2