Iron law of processor performance explained

In computer architecture, the iron law of processor performance (or simply iron law of performance) describes the performance trade-off between complexity and the number of primitive instructions that processors use to perform calculations.[1] This formulation of the trade-off spurred the development of Reduced Instruction Set Computers (RISC) whose instruction set architectures (ISAs) leverage a smaller set of core instructions to improve performance. The term was coined by Douglas Clark based on research performed by Clark and Joel Emer in the 1980s.[2]

Explanation

The performance of a processor is the time it takes to execute a program:

\tfrac{Time{Program}}

. This can be further broken down into three factors:[3]

\mathrmSelection of an instruction set architecture affects

\tfrac{Instructions{Program} x \tfrac{ClockCycles}{Instruction}}

, whereas

\tfrac{Time{ClockCycles}}

is largely determined by the manufacturing technology. Classic Complex Instruction Set Computer (CISC) ISAs optimized

\tfrac{Instructions{Program}}

by providing a larger set of more complex CPU instructions. Generally speaking, however, complex instructions inflate the number of clock cycles per instruction

\tfrac{ClockCycles{Instruction}}

because they must be decoded into simpler micro-operations actually performed by the hardware. After converting X86 binary to the micro-operations used internally, the total number of operations is close to what is produced for a comparable RISC ISA.[4] The iron law of processor performance makes this trade-off explicit and pushes for optimization of

\tfrac{Time{Program}}

as a whole, not just a single component.

While the iron law is credited for sparking the development of RISC architectures, it does not imply that a simpler ISA is always faster. If that were the case, the fastest ISA would consist of simple binary logic. A single CISC instruction can be faster than the equivalent set of RISC instructions when it enables multiple micro-operations to be performed in a single clock cycle. In practice, however, the regularity of RISC instructions allowed a pipelined implementation where the total execution time of an instruction was (typically) ~5 clock cycles, but each instruction followed the previous instruction ~1 clock cycle later . CISC processors can also achieve higher performance using techniques such as modular extensions, predictive logic, compressed instructions, and macro-operation fusion.[5]

See also

Notes and References

  1. Book: Eeckhout . Lieven . Computer Architecture Performance Evaluation Methods . 2010 . Morgan & Claypool . 9781608454679 . 5–6 . 9 March 2021.
  2. http://emer.org/Family/Joel/Professional/papers/1984-isca-vax.pdf A Characterization of Processor Performance in the VAX-11/780
  3. Web site: Asanovic. Krste. Krste Asanović. 2019. Lecture 4 - Pipelining. live. https://archive.today/20210312020632/https://inst.eecs.berkeley.edu/~cs152/sp09/lectures/L04-Pipelining.pdf. 2021-03-12. 2020-03-11. Department of Electrical Engineering and Computer Sciences at UC Berkeley. 2. Lecture Slides.
  4. 1607.02318. cs.AR. Christopher. Celio. Palmer. Dabbelt. The Renewed Case for the Reduced Instruction Set Computer: Avoiding ISA Bloat with Macro-Op Fusion for RISC-V. 2016-07-08. Patterson. David A.. Asanović. Krste. Krste Asanović.
  5. Web site: Engheim. Erik. 2020-12-28. The Genius of RISC-V Microprocessors. 2021-03-11. Medium. en.