In computer architecture, Amdahl's law (or Amdahl's argument[1]) is a formula which gives the theoretical speedup in latency of the execution of a task at fixed workload that can be expected of a system whose resources are improved.
The law can be stated as:
"the overall performance improvement gained by optimizing a single part of a system is limited by the fraction of time that the improved part is actually used".[2]It is named after computer scientist Gene Amdahl, and was presented at the American Federation of Information Processing Societies (AFIPS) Spring Joint Computer Conference in 1967.
Amdahl's law is often used in parallel computing to predict the theoretical speedup when using multiple processors. For example, if a program needs 20 hours to complete using a single thread, and a one-hour portion of the program cannot be parallelized, then only the remaining 19 hours' execution time can be parallelized. Therefore, regardless of how many threads are devoted to a parallelized execution of this program, the minimum execution time is always more than 1 hour. Hence, the theoretical speedup is, at most, 20 times the single thread performance,
\left(\dfrac{1}{1-p}=20\right)
Amdahl's law can be formulated in the following way:
Slatency(s)=
1 | |||||
|
Furthermore,
\begin{cases} Slatency(s)\leq\dfrac1{1-p}\\[8pt] \lim\limitssSlatency(s)=\dfrac1{1-p}. \end{cases}
shows that the theoretical speedup of the execution of the whole task increases with the improvement of the resources of the system and that regardless of the magnitude of the improvement, the theoretical speedup is always limited by the part of the task that cannot benefit from the improvement.
Amdahl's law applies only to the cases where the problem size is fixed. In practice, as more computing resources become available, they tend to get used on larger problems (larger datasets), and the time spent in the parallelizable part often grows much faster than the inherently serial work. In this case, Gustafson's law gives a less pessimistic and more realistic assessment of the parallel performance.[3]
A task executed by a system whose resources are improved compared to an initial similar system can be split up into two parts:
An example is a computer program that processes files. A part of that program may scan the directory of the disk and create a list of files internally in memory. After that, another part of the program passes each file to a separate thread for processing. The part that scans the directory and creates the file list cannot be sped up on a parallel computer, but the part that processes the files can.
The execution time of the whole task before the improvement of the resources of the system is denoted as
T
p
T=(1-p)T+pT.
It is the execution of the part that benefits from the improvement of the resources that is accelerated by the factor
s
p | |
s |
T.
The theoretical execution time
T(s)
T(s)=(1-p)T+
p | |
s |
T.
Amdahl's law gives the theoretical speedup in latency of the execution of the whole task at fixed workload
W
Slatency(s)=
TW | |
T(s)W |
=
T | |
T(s) |
=
1 | |||||
|
.
If 30% of the execution time may be the subject of a speedup, p will be 0.3; if the improvement makes the affected part twice as fast, s will be 2. Amdahl's law states that the overall speedup of applying the improvement will be:
Slatency=
1 | |||||
|
=
1 | |||||
|
=1.18.
For example, assume that we are given a serial task which is split into four consecutive parts, whose percentages of execution time are,,, and respectively. Then we are told that the 1st part is not sped up, so, while the 2nd part is sped up 5 times, so, the 3rd part is sped up 20 times, so, and the 4th part is sped up 1.6 times, so . By using Amdahl's law, the overall speedup is
Slatency=
1 | |||||||||||||
|
=
1 | |||||||||||||
|
=2.19.
For example, with a serial program in two parts A and B for which and,
Therefore, making part A to run 2 times faster is better than making part B to run 5 times faster. The percentage improvement in speed can be calculated as
percentageimprovement=100\left(1-
1 | |
Slatency |
\right).
If the non-parallelizable part is optimized by a factor of, then
T(O,s)=(1-p)
T | |
O |
+
p | |
s |
T.
It follows from Amdahl's law that the speedup due to parallelism is given by
Slatency(O,s)=
T(O) | |
T(O,s) |
=
| { | ||||||
} |
1-p | |
O |
+
p | |
s}. |
When
s=1
Slatency(O,s)=1
When
s=infty
Slatency(O,infty)=
T(O) | |
T(O,s) |
=
| { | ||||||
} |
1-p | |
O |
+
p | |
s}= |
1+
p | |
1-p |
O.
If
1-p=0.4
O=2
s=5
Slatency(O,s)=
T(O) | |
T(O,s) |
=
{0.4 | |||
|
+0.6}{
0.4 | |
2 |
+
0.6 | |
5 |
}=2.5.
Next, we consider the case wherein the non-parallelizable part is reduced by a factor of, and the parallelizable part is correspondingly increased. Then
T'(O',s)=
1-p | |
O' |
T+\left(1-
1-p | |
O' |
\right)
T | |
s |
.
It follows from Amdahl's law that the speedup due to parallelism is given by
S'latency(O',s)=
T'(O') | |
T'(O',s) |
=
1 | |||||||||||
|
.
Amdahl's law is often conflated with the law of diminishing returns, whereas only a special case of applying Amdahl's law demonstrates law of diminishing returns. If one picks optimally (in terms of the achieved speedup) what is to be improved, then one will see monotonically decreasing improvements as one improves. If, however, one picks non-optimally, after improving a sub-optimal component and moving on to improve a more optimal component, one can see an increase in the return. Note that it is often rational to improve a system in an order that is "non-optimal" in this sense, given that some improvements are more difficult or require larger development time than others.
Amdahl's law does represent the law of diminishing returns if one is considering what sort of return one gets by adding more processors to a machine, if one is running a fixed-size computation that will use all available processors to their capacity. Each new processor added to the system will add less usable power than the previous one. Each time one doubles the number of processors the speedup ratio will diminish, as the total throughput heads toward the limit of 1/(1 − p).
This analysis neglects other potential bottlenecks such as memory bandwidth and I/O bandwidth. If these resources do not scale with the number of processors, then merely adding processors provides even lower returns.
An implication of Amdahl's law is that to speed up real applications which have both serial and parallel portions, heterogeneous computing techniques are required.[4] There are novel speedup and energy consumption models based on a more general representation of heterogeneity, referred to as the normal form heterogeneity, that support a wide range of heterogeneous many-core architectures. These modelling methods aim to predict system power efficiency and performance ranges, and facilitates research and development at the hardware and system software levels.[5] [6]