Class: | Nucleic acid structure prediction |
Time: | O(n3) |
Space: | O(n2) |
The Nussinov algorithm is a nucleic acid structure prediction algorithm used in computational biology to predict the folding of an RNA molecule that makes use of dynamic programming principles.[1] The algorithm was developed by Ruth Nussinov in the late 1970s.
RNA origami occurs when an RNA molecule "folds" and binds to itself. This folding often determines the function of the RNA molecule. RNA folds at different levels, this algorithm predicts the secondary structure of the RNA.
We score a solution by counting the total number of paired bases. Thus, attempting to maximize the score that maximizes the total number of bonds between bases.
Consider an RNA sequence
S
\{A,U,C,G\}
Si
Sj-1
Su
Sv
i\lequ\leqv\leqj-1
Si
Sj
Sj
Si
Sj-1
Si
Sj-1
Sj
Sk
i\leqk<j
Si
Sk-1
Sk+1
Sj-1
Consider an RNA sequence
S
n
Si\in\{A,U,C,G\}
Construct an
n x n
M
M
M(i,i)=0
M(i,i-1)=0
for
1\leqi\leqn
M(i,j)
Si...Sj
M
M(i,j)=maxi\leq\begin{cases}M(i,k-1)+M(k+1,j-1)+Score(Sk,Sj)\ M(i,j-1)\end{cases}
where
Score(Sk,Sj)=\begin{cases}1,&SkandSjcomplementary\\ 0,&otherwise.\end{cases}
After this step, we have a matrix
M
M(i,j)
Si...Sj
To determine the structure of the folded RNA by traceback, we first create an empty list of pairs
P
i=1,j=n
j\leqi
M(i,j)=M(i,j-1)
i=i,j=j-1
k:i\leqk<j
Sk
Sj
M(i,j)=M(i,k-1)+M(k+1,j-1)+1
(k,j)
P
i=i,j=k-1
i=k+1,j=j-1
When the traceback finishes,
P
The Nussinov algorithm does not account for the three-dimensional shape of RNA, nor predict RNA pseudoknots.[2] Furthermore, in its basic form, it does not account for a minimum stem loop size. However, it is still useful as a fast algorithm for basic prediction of secondary structure.