Question: Calculating sequencing errors
0

There is a circular genome of length L. You have a sequencing machine which reads a consecutive subsequence of length exactly n of this genome starting from a position chosen uniformly at random. However, there is a problem: the machine makes errors -- each nucleotide is replaced with an incorrect one with probability p.

You have a budget for k reads. After the sequencing is done you count nucleotides read from each genome position. After that for each genome position you decide which nucleotide is located at this position in the genome by the following rules:

The desired nucleotide is the one with the largest number of occurrences at this position; If there are several suitable nucleotides you choose one of them uniformly at randomly. In particular, if a position has never been covered by a read, then every nucleotide occurred 0 times at this position and a nucleotide is chosen uniformly at random.

You are given several tests. Each test is represented by the length of the genome L, the length of a read n, the probability of an error p and the total number of reads k. Find the expected number of positions with incorrect nucleotides.

The first line of the input contains the number of test cases t. Each test case is specified by four numbers L, n, p and k.

``````SAMPLE INPUT
4 2 0.000 1
4 2 0.000 2
10 3 0.050 4

SAMPLE OUTPUT
1.50000000000
0.75000000000
2.14491825000
``````

In the first test case, the length of the genome is 4. The sequencer gives us only one read and it always makes no mistakes, so it provides us with true information about 2 nucleotides. For each other nucleotide the probability of the correct choice is 0.25, so the answer is `2⋅(1−0.25)=1.5.`

In the second test case, we can use the same sequences twice, so we can know true information about 2, 3 or 4 nucleotides.

Knowing information only about 2 nucleotides happens with the probability 0.25 and the expected number of mistakes is 1.5. Knowing information about exactly 3 nucleotides happens with the probability 0.5 and the expected number of mistakes is 0.75. We know information about all nucleotides with the probability 0.25 and the expected number of mistakes is 0. So the answer for the second test case is `0.25⋅1.5+0.5⋅0.75+0.25⋅0= 0.75`.

CAN YOU EXPLAIN HOW WAS THIRD SAMPLE OUTPUT ACHIEVED FROM THIRD SAMPLE INPUT?

sequencing • 455 views
ADD COMMENTlink
modified 6 months ago • written 6 months ago by khanirfanbioinformatics0

You know your TA is probably in this forum, right?

ADD REPLYlink written 6 months ago by Asaf6.1k

I graduated 2 years ago. Besides, I have asked for an explaination for an example, I have not uploaded my test files.

ADD REPLYlink written 6 months ago by khanirfanbioinformatics0

Sorry, it just sounded like homework

ADD REPLYlink written 6 months ago by Asaf6.1k

It's question 2 from the [Stepik] Bioinformatics Contest 2019 Qualification Round which started 7 days ago .. The deadline passed 4 hours ago today.

ADD REPLYlink modified 6 months ago • written 6 months ago by os461430

I am asking for the explanation of question.

ADD REPLYlink written 6 months ago by khanirfanbioinformatics0
Please log in to add an answer.

Content
Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1072 users visited in the last hour