A point accepted mutation — also known as a PAM — is the replacement of a single amino acid in the primary structure of a protein with another single amino acid, which is accepted by the processes of natural selection. This definition does not include all point mutations in the DNA of an organism. In particular, silent mutations are not point accepted mutations, nor are mutations that are lethal or that are rejected by natural selection in other ways.
A PAM matrix is a matrix where each column and row represents one of the twenty standard amino acids. In bioinformatics, PAM matrices are sometimes used as substitution matrices to score sequence alignments for proteins. Each entry in a PAM matrix indicates the likelihood of the amino acid of that row being replaced with the amino acid of that column through a series of one or more point accepted mutations during a specified evolutionary interval, rather than these two amino acids being aligned due to chance. Different PAM matrices correspond to different lengths of time in the evolution of the protein sequence.
The genetic instructions of every replicating cell in a living organism are contained within its DNA. Throughout the cell's lifetime, this information is transcribed and replicated by cellular mechanisms to produce proteins or to provide instructions for daughter cells during cell division, and the possibility exists that the DNA may be altered during these processes.[1] [2] This is known as a mutation. At the molecular level, there are regulatory systems that correct most — but not all — of these changes to the DNA before it is replicated.[2] [3]
One of the possible mutations that occurs is the replacement of a single nucleotide, known as a point mutation. If a point mutation occurs within an expressed region of a gene, an exon, then this will change the codon specifying a particular amino acid in the protein produced by that gene.[2] Despite the redundancy in the genetic code, there is a possibility that this mutation will then change the amino acid that is produced during translation, and as a consequence the structure of the protein will be changed.
The functionality of a protein is highly dependent on its structure.[4] Changing a single amino acid in a protein may reduce its ability to carry out this function, or the mutation may even change the function that the protein carries out.[2] Changes like these may severely impact a crucial function in a cell, potentially causing the cell — and in extreme cases, the organism — to die.[5] Conversely, the change may allow the cell to continue functioning albeit differently, and the mutation can be passed on to the organism's offspring. If this change does not result in any significant physical disadvantage to the offspring, the possibility exists that this mutation will persist within the population. The possibility also exists that the change in function becomes advantageous. In either case, while being subjected to the processes of natural selection, the point mutation has been accepted into the genetic pool.
The 20 amino acids translated by the genetic code vary greatly by the physical and chemical properties of their side chains.[4] However, these amino acids can be categorised into groups with similar physicochemical properties.[4] Substituting an amino acid with another from the same category is more likely to have a smaller impact on the structure and function of a protein than replacement with an amino acid from a different category. Consequently, acceptance of point mutations depends heavily on the amino acid being replaced in the mutation, and the replacement amino acid. The PAM matrices are a mathematical tool that account for these varying rates of acceptance when evaluating the similarity of proteins during alignment.
The term accepted point mutation was initially used to describe the mutation phenomenon. However, the acronym PAM was preferred over APM due to readability, and so the term point accepted mutation is used more regularly.[6] Because the value
n
It is important to distinguish between point accepted mutations (PAMs), point accepted mutation matrices (PAM matrices) and the PAMn matrix. The term 'point accepted mutation' refers to the mutation event itself. However, 'PAM matrix' refers to one of a family of matrices which contain scores representing the likelihood of two amino acids being aligned due to a series of mutation events, rather than due to random chance. The 'PAMn matrix' is the PAM matrix corresponding to a time frame long enough for
n
PAM matrices were introduced by Margaret Dayhoff in 1978.[7] The calculation of these matrices was based on 1572 observed mutations in the phylogenetic trees of 71 families of closely related proteins. The proteins to be studied were selected on the basis of having high similarity with their predecessors. The protein alignments included were required to display at least 85% identity.[6] [8] As a result, it is reasonable to assume that any aligned mismatches were the result of a single mutation event, rather than several at the same location.
Each PAM matrix has twenty rows and twenty columns — one representing each of the twenty amino acids translated by the genetic code. The value in each cell of a PAM matrix is related to the probability of a row amino acid before the mutation being aligned with a column amino acid afterwards.[6] [7] [8] From this definition, PAM matrices are an example of a substitution matrix.
For each branch in the phylogenetic trees of the protein families, the number of mismatches that were observed were recorded and a record kept of the two amino acids involved.[7] These counts were used as entries below the main diagonal of the matrix
A
A
A
A
In addition to these counts, data on the mutability and the frequency of the amino acids was obtained.[6] [7] The mutability of an amino acid is the ratio of the number of mutations it is involved in and the number of times it occurs in an alignment.[7] Mutability measures how likely an amino acid is to mutate acceptably. Asparagine, an amino acid with a small polar side chain, was found to be the most mutable of the amino acids.[7] Cysteine and tryptophan were found to be the least mutable amino acids.[7] The side chains for cysteine and tryptophan have less common structures: cysteine's side chain contains sulfur which participates in disulfide bonds with other cysteine molecules, and tryptophan's side chain is large and aromatic.[4] Since there are several small polar amino acids, these extremes suggest that amino acids are more likely to acceptably mutate if their physical and chemical properties are more common among alternative amino acids.[6] [8]
For the
j
m(j)
f(j)
j
n(j)
N
f(j)=
n(j) | |
N |
Based on the definition of mutability as the ratio of mutations to occurrences of an amino acid
m(j)=
| ||||||||||
n(j) |
1 | |
Nf(j) |
=
1 | |
n(j) |
=
m(j) | |||||||||
|
The mutation matrix
M
M(i,j)
j
i
M(i,j)=λA(i,j)
m(j) | |||||||||
|
=
λA(i,j) | |
Nf(j) |
=
λA(i,j) | |
n(j) |
where
λ
M
M(j,j)=1-
20 | |
\sum | |
i=1,i ≠ j |
M(i,j)
which simplifies to[7]
M(j,j)=1-λm(j)
A result of particular significance is that for the non-diagonal entries
f(j)M(i,j)=
λ | |
N |
A(i,j)=
λ | |
N |
A(j,i)=f(i)M(j,i)
Which means that for all entries in the mutation matrix
f(j)M(i,j)=f(i)M(j,i)
The probabilities contained in
M
M
The constant
λ
n
To find the mutation matrix for the PAM1 matrix, the requirement that 99% of the amino acids in a sequence are conserved is imposed. The quantity
n(j)M(j,j)
j
20 | |
\sum | |
j=1 |
n(j)M(j,j)=
20 | |
\sum | |
j=1 |
n(j)-λ
20 | |
\sum | |
j=1 |
n(j)m(j)=N-Nλ
20 | |
\sum | |
j=1 |
f(j)m(j)
The value of
λ
0.99=1-
20 | |
λ\sum | |
j=1 |
f(j)m(j)
This
λ
The Markov chain model of protein mutation relates the mutation matrix for PAMn,
Mn
M1
Mn=
n | |
M | |
1 |
The PAMn matrix is constructed from the ratio of the probability of point accepted mutations replacing the
j
i
PAMn(i,j)=log
f(j)Mn(i,j) | |
f(i)f(j) |
=log
f(j)Mn(i,j) | |
f(i)f(j) |
=log
Mn(i,j) | |
f(i) |
Note that in Gusfield's book, the entries
M(i,j)
PAMn(i,j)
i
j
When using the PAMn matrix to score an alignment of two proteins, the following assumption is made:
If these two proteins are related, the evolutionary interval separating them is the time taken for
n
i
j
j
j
f(j)
i
Mn(i,j)
f(j)Mn(i,j)
i
j
f(i)
f(j)
f(i)f(j)
While the mutation probability matrix
M
f(j)M(i,j)=f(i)M(j,i)
In fact, this relationship holds for all positive integer powers of the matrix
M
f(j)Mn(i,j)=f(i)Mn(j,i)
As a result, the entries of the PAMn matrix are symmetric, since
PAMn(i,j)=log
f(j)Mn(i,j) | |
f(j)f(i) |
=log
f(i)Mn(j,i) | |
f(i)f(j) |
=PAMn(j,i)
The value
n
m
m | |
100 |
=1-
| ||||
e |
The validity of these estimates can be verified by counting the number of amino acids that remain unchanged under the action of the matrix
M
20 | |
\sum | |
j=1 |
n(j)Mn(j,j)
and so the proportion of unchanged amino acids is
| ||||||||||
N |
=
20 | |
\sum | |
j=1 |
f(j)Mn(j,j)=1-
m | |
100 |
A PAM250 is a commonly used scoring matrix for sequence comparison. Only the lower half of the matrix needs to be computed, since by their construction, PAM matrices are required to be symmetric. Each of the 20 amino acid are shown down the top and side of the matrix, with 3 additional ambiguous amino acids. The amino acids are most commonly shown listed alphabetically, or listed in groups. These groups are the characteristics shared among the amino acids.[7]
The molecular clock hypothesis predicts that the rate of amino acid substitution in a particular protein will be approximately constant over time, though this rate may vary between protein families.[13] This suggests that the number of mutations per amino acid in a protein increases approximately linearly with time.
Determining the time at which two proteins diverged is an important task in phylogenetics. Fossil records are often used to establish the position of events on the timeline of the Earth's evolutionary history, but the application of this source is limited. However, if the rate at which the molecular clock of protein family ticks — that is, the rate at which the number of mutations per amino acid increases — is known, then knowing this number of mutations would allow the date of divergence to be found.
Suppose the date of divergence for two related proteins, taken from organisms living today, is sought. The two proteins have both been accumulating accepted mutations since the date of divergence, and so the total number of mutations per amino acid separating them is approximately twice that which separates them from their common ancestor. If a range of PAM matrices are used to align two proteins that are known to be related, then the value of
n
T=
K | |
2r |
Where
K
r
PAM matrices are also used as a scoring matrix when comparing DNA sequences or protein sequences to judge the quality of the alignment. This form of scoring system is utilized by a wide range of alignment software including BLAST.[15]
Although the PAM log-odds matrices were the first scoring matrices used with BLAST, the PAM matrices have largely been replaced by the BLOSUM matrices. Although both matrices produce similar scoring outcomes they were generated using differing methodologies. The BLOSUM matrices were generated directly from the amino acid differences in aligned blocks that have diverged to varying degrees the PAM matrices reflect the extrapolation of evolutionary information based on closely related sequences to longer timescales.[16] Since scoring information for the PAM and BLOSUM matrices were generated in very different ways the numbers associated with the matrices have fundamentally different meanings; the numbers for PAM matrices increase for comparisons among more divergent proteins whereas the numbers for the BLOSUM matrices decrease.[17] However, all amino acid substitution matrices can be compared in an information theoretic framework[18] using their relative entropy.
PAM100 | Blosum90 | 1.18 | |
PAM120 | Blosum89 | 0.98 | |
PAM160 | Blosum60 | 0.70 | |
PAM200 | Blosum52 | 0.51 | |
PAM250 | Blosum45 | 0.36 |