Probalign is a sequence alignment tool that calculates a maximum expected accuracy alignment using partition function posterior probabilities.[1] Base pair probabilities are estimated using an estimate similar to Boltzmann distribution. The partition function is calculated using a dynamic programming approach.
The following describes the algorithm used by probalign to determine the base pair probabilities.[2]
To score an alignment of two sequences two things are needed:
\sigma(x,y)
g(k)=\alpha+\betak
S(a)
S(a)=
\sum | |
xi-yj\ina |
\sigma(xi,yj)+gapcost
Now the boltzmann weighted score of an alignment a is:
| ||||
e |
=
| ||||||||||
e |
=\left(
\prod | |
xi-yi\ina |
| ||||
e |
\right) ⋅
| ||||
e |
Where
T
The probability of an alignment assuming boltzmann distribution is given by
Pr[a|x,y]=
| ||||||||
Z |
Where
Z
Let
Zi,j
x0,x1,...,xi
y0,y1,...,yj
M | |
Z | |
i,j |
:
I | |
Z | |
i,j |
:
(-,yj)
D | |
Z | |
i,j |
:
(xi,-)
Zi,j=
M | |
Z | |
i,j |
+
D | |
Z | |
i,j |
+
I | |
Z | |
i,j |
The matrixes are initialized as follows:
M | |
Z | |
0,j |
=
M | |
Z | |
i,0 |
=0
M | |
Z | |
0,0 |
=1
D | |
Z | |
0,j |
=0
I | |
Z | |
i,0 |
=0
The partition function for the alignments of two sequences
x
y
Z|x|,|y|
M | |
Z | |
i,j |
=Zi-1,j-1 ⋅
| ||||
e |
D | |
Z | |
i,j |
=
D | |
Z | |
i-1,j |
⋅
| ||||
e |
+
M | |
Z | |
i-1,j |
⋅
| ||||
e |
+
I | |
Z | |
i-1,j |
⋅
| ||||
e |
I | |
Z | |
i,j |
Finally the probability that positions
xi
yj
P(xi-yj|x,y)=
| ||||||||||||||
Z|x|,|y| |
Z',i',j'
Z