How Much Of The Genome Will Remain Un-Sequenced At A Given Coverage?
1
1
Entering edit mode
10.5 years ago
kerpalha ▴ 10

Hi I'm taking a class and we have a homework problem that references shotgun sequencing and I'm stuck.. I'm not asking for the answer. I just want some clarification.. Here's the problem:

Consider this simplified model of whole-genome shotgun sequencing:

  1. many copies of a genome of length G are broken at many points uniformly at random (the probability of breakage at each location in the genome is uniform over the locations)
  2. the location of each fragment so produced is independent of the locations of the other fragments
  3. edge eff ects can be ignored
  4. the fragments are cloned into vectors and this works perfectly
  5. from each cloned fragment, exactly L nucleotides are sequenced from one end of the fragment, yielding a total of R sequence reads of length L (where L is much, much smaller than G)**

a) The coverage C of a shotgun sequencing is the expected number of times each nucleotide in the genome has been sequenced during the procedure. Provide an expression for C in terms of other defined quantities.

Answer: C= RL/G

This is the part I'm having trouble with

b) What is the probability that a speci c location in the genome will not be covered by any of the R reads (your answer must be a function of R and C)? Using this probability, write an expression for the expected number of nucleotides in the genome that remain unsequenced during this procedure (your answer must be a function of G and C).

Answer: From wikipedia, I obtained that the probability of not covering a given location on the target for N fragments is: [1- C/R]^R or e^-C. How do I approach part 2

• 4.4k views
ADD COMMENT
4
Entering edit mode
10.5 years ago

Search for lander-waterman model on Google and you will find many salient descriptions of the model and how the various values are derived (like this one).

You can think of the probability of a nucleotide not being sequenced in terms of percent/fraction of genome that is not covered because under the model the events are independent. Thus the number of non-sequenced nucleotides will be N = G * exp(-C)

ADD COMMENT

Login before adding your answer.

Traffic: 3360 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6