Question

What is happening in the Nussinov Algorithm?

0

Entering edit mode

4.5 years ago

M.O.L.S ▴ 100

I am trying to understand the Nussinov Algorithm, and I have a few questions: maybe more.

What are i and j. Are they just called this because there are paired with each other?

What is K and K + 1 in the bifurcation?

What is happening during initialisation?

From what I can see, one just puts 0s in a matrix , but why is this done? And why is half of the diagonal matrix not completed at all ?

Also, what is happening during recursion?

Please do not explain in pseudocode =).

Thanks in advance.

Nussinov Algorithm • 2.1k views

ADD COMMENT • link 4.4 years ago by M.O.L.S ▴ 100

score 4 · Accepted Answer · 2019-11-01

What are i and j. Are they just called this because there are paired with each other?

i and j are two bases. They might be called a and b, b_i and b_j or bugs and bunny. i and j are only assumed to be paired with each in case 1 of the scoring function.

What is K and K + 1 in the bifurcation?

The bifurcation is needed to deal to model the case when bases i and j belong to two distinct structure motifs (loops), e.g. two hairpins. More concrete: motif 1 = S[i..k] and motif 2 = S[k'..j] with k' = k+1..j-1. This will then give the maximum number of basepairs as the sum of basepairs in motif 1 and motif 2.

What is happening during initialisation? From what I can see, one just puts 0s in a matrix , but why is this done? And why is half of the diagonal matrix not completed at all ?

Given an RNA sequence S, the cell N[i, j] in the Nussinov matrix gives you the maximum number of possible base pairs in the subsequence S[i..j].

Think about what this means for the matrix:

Each pair of bases (i, j) represents a sequence of length j - i + 1. The main diagonal is for sequences of length 1 (S[i..i]) and the one below it for sequences of length 0. Since sequences cannot have negative lengths, the bottom part of the matrix is left unfilled.
There is no base pair in sequences of length 1 and 0, hence N[i, i] = 0 (length 1) and N[i, i-1] = 0, for i in {1..length(S)}. These are the starting cases (hence "Initialization")

Also, what is happening during recursion?

During recursion you go through each subsequence S[i..j] and calculate the maximum number of base pairs possible for that sequence. This is done based on maximal numbers of base pairs that you have calculated (or initialized to 0) in the preceeding steps and the initialization phase.

score 1 · Accepted Answer · 2019-12-08

This is a reply, not an answer... I understand the use of j-i +1. Without this the initialisation doesn't make sense. I appreciate your help, Thank you, although it has brought about a few more questions...

So I understand that i am comparing 1 RNA sequence with itself. If I was doing an alignment I would be inserting gaps and such, so I'm not doing that in this case. The RNA sequence can also be written on the top and left hand side of the matrix, along with the axes numbers and the coordinates for each cell in the matrix. N[i1, j1] excetera.

Can I assume that a sequence of length 0 consists of no ribonucleotides and a sequence of length 1 consists of a single ribonucleotide? Then how do i know what a sequence of length 3 or 4 looks like? and how do I know if the sequence of length 3 or 4 is base paired or not and where the base pairs are?

After initialisation, is it okay to start with sequences of length 2, and see if they base pair or not? Is this the minimum length I can start with, or should i make sure that it is of length 3, so that there is a guaranted base pair somewhere?

Is the output of this recursion the structure of the base paired RNA in dot bracket form or is it the structure of the base paired RNA in ribonucleotide letters?

During recursion, why should I traverse through and fill up the matrix in rows of negative diagonals? Is there a special reason for this?

What is case one of the scoring function?

Also, what am I doing with the Nussinov Algorithm? Am I looking for the structure that has the most base pairs? Am I looking for the structure with the most-likely number of base pairs(not the most, not the least)?

If the scoring functions are:

score i, j-1

score( i+1, j-1) + different_score (i, j)

score (i + 1, j)

Is there a particular one that should be used first, second and/or third, or does it not matter?

Maybe these aren't scoring functions. If not, then the scoring function is put a 1 if there is a base pair that can occur and a 0 if a base pair cannot occur.