Question regarding searching problems
1
0
Entering edit mode
6.5 years ago

I have completed part Q1 of this question and I am not understanding what Q2 is asking for?

Q1. Consider the searching problem:

Input: A Genome G ∈ Σ∗ = A string (sequence of characters) over an alphabet of characters Σ = {a, t, c, g}; A reverse-complement restriction pattern k-cutters of length k, pk ∈ Σ∗, such that Reverse(Complement(pk)) = pk.

Output: An index i such that pk = G[i..i + k − 1] or the special value NIL if v does not appear in G. Write pseudocode for linear search, which scans through the sequence, looking for pk. How will you optimize it, if you need to search for multiple pk’s?

LinearSearch(G, pk): 
   i = 0 
   while i < length(G): 
      if G[i] == pk[0]: 
      i2 = i 
      j = 0 
      while G[i2] == pk[j] and j < k: 
         i2 = i2 +1 
         j = j + 1  
if j == k: 
   return i 
   i = i + 1 
if i == length(G): 
   return NIL

Q2. In the same setting as the one described above: Let |G| = n and |pk| = k, what is the probability of a pk occurring in G, assuming that G is a random string with all characters occurring equiprobably. Consider the distance between two consecutive occurrences of pk in G: What is its average value? Variance?

genome string • 1.6k views
ADD COMMENT
1
Entering edit mode
6.5 years ago
kloetzl ★ 1.1k

Instead of answering your homework for you, let me rephrase Q2:

Given pk and a random string of equal length g what is the probability of them to be the same string? How many gs are in G? Can all of them be considered independent? Accumulate these numbers into the probability of a string pk occurring in a random genome.

The second part seems non-obvious to me, but nonetheless doable if you find the right formula for the average distance.

ADD COMMENT

Login before adding your answer.

Traffic: 2555 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6