Intel microcode explained

Intel microcode is microcode that runs inside x86 processors made by Intel. Since the P6 microarchitecture introduced in the mid-1990s, the microcode programs can be patched by the operating system or BIOS firmware to work around bugs found in the CPU after release. Intel had originally designed microcode updates for processor debugging under its design for testing (DFT) initiative.[1]

Following the Pentium FDIV bug, the patchable microcode function took on a wider purpose to allow in-field updating without needing to do a product recall.

In the P6 and later microarchitectures, x86 instructions are internally converted into simpler RISC-style micro-operations that are specific to a particular processor and stepping level.

Pre-P6 microcode

On the Intel 80486 and AMD Am486 there are approximately 5000 lines of microcode assembly, totalling approximately 240 Kbits stored in the microcode ROM.[2]

P6 and later micro-operations

Starting with the Pentium Pro, in most Intel x86 processors, instructions are converted by the instruction fetch and decode unit to sequences of processor-specific micro-operations that are directly executed by the processor. For the instructions that are implemented in microcode, the microcode consists of micro-operations fetched from on-chip memory.[3]

On the Pentium Pro, each micro-operation is 72-bits wide,[4] or 118-bits wide.[5] This includes an opcode, two source fields, and one destination field,[6] with the ability to hold a 32-bit immediate value.[5] The Pentium Pro is able to detect parity errors in its internal microcode and report these via the Machine Check Architecture.[7]

Micro-operations have a consistent format with up to three source inputs, and two destination outputs. The processor performs register renaming to map these inputs to and from the real register file (RRF) before and after their execution. Out-of-order execution is used, so the micro-operations and instructions they represent may not appear in the same order.

During development of the Pentium Pro, several microcode fixes were included between the A2 and B0 steppings.[8] For the Pentium II (based on the P6 Pentium Pro), additional micro-operations were added to support the MMX instruction set. In several cases, "microcode assists" were added to handle rare corner-cases in a reliable way.

The Pentium 4 can have 126 micro-operations in flight at the same time. Micro-operations are decoded and stored in an Execution Trace Cache with 12,000 entries, to avoid repeated decoding of the same x86 instructions. Groups of six micro-operations are packed into a trace line. Micro-operations can borrow extra immediate data space within the same cache-line.[9] Complex instructions, such as exception handling, result in jumping to the microcode ROM. During development of the Pentium 4, microcode accounted for 14% of processor bugs versus 30% of processor bugs during development of the Pentium Pro.[10]

The Intel Core microarchitecture introduced in 2006 added "macro-operations fusion" for some common pairs of instructions including comparison followed by a jump. The instruction decoders in the Core convert x86 instructions into microcode in three different ways:

Conversion of x86 instructions to micro-operations on Core
x86 instructions x86 decoders micro-operations
common simple decoder × 3 1–3
most others complex decoder × 1 ≤4
very complex microcode sequencer many

For Intel's hyper-threading implementation of simultaneous multithreading, the microcode ROM, trace cache, and instruction decoders are shared, but the micro-operation queue is not shared.

Update facility

In the mid-1990s, a facility for supplying new microcode was initially referred to as the Pentium Pro BIOS Update Feature.[11] It was intended that user-mode applications should make a BIOS interrupt call to supply a new "BIOS Update Data Block", which the BIOS would partially validate and save to nonvolatile BIOS memory; this could be supplied to the installed processors on next boot.

Intel distributed a program called BUP_UTIL.EXE, renamed CHECKUP3.EXE that could be run under DOS. Collections of multiple microcode updates were concatenated together and numerically numbered with the extension .PDB, such as PEP6.PDB.[12]

Processor interface

The processor boots up using a set of microcode held inside the processor and stored in an internal ROM. A microcode update populates a separate SRAM and set of "match registers" that act as breakpoints within the microcode ROM, to allow jumping to the updated list of micro-operations in the SRAM. A match is performed between the Microcode Instruction Pointer (UIP) all of the match registers, with any match resulting in a jump to the corresponding destination microcode address.[1] In the original P6 architecture there is space in the SRAM for 60 micro-operations, and multiple match/destination register pairs.[1] It takes one processor instruction cycle to jump from ROM microcode to patched microcode held in SRAM. Match registers consist of a microcode match address, and a microcode destination address.

The processor must be in protection ring zero ("") in order to initiate a microcode update. Each CPU in a symmetric multiprocessing arrangement needs to be updated individually.

An update is initiated by placing its address in eax register, setting ecx = 0x79, and executing a wrmsr (Write model-specific register).

Microcode update format

Intel distributes microcode updates as a 2,048 (2 kilobyte) binary blob. The update contains information about which processors it is designed for, so that this can be checked against the result of the CPUID instruction. The structure is a 48-byte header, followed by 2,000 bytes intended to be read directly by the processor to be updated:

  1. A microcode program that is executed by the processor during the microcode update process. This microcode is able to reconfigure and enable or disable components using a special register, and it must update the breakpoint match registers.
  2. Up to sixty patched micro-operations to be populated into the SRAM.
  3. Padding consisting of random values, to obfuscate understanding of the format of the microcode update.

Each block is encoded differently, and the majority of the 2,000 bytes are not used as configuration program and SRAM micro-operation contents themselves are much smaller. Final determination and validation of whether an update can be applied to a processor is performed during decryption via the processor.[11] Each microcode update is specific to a particular CPU revision, and is designed to be rejected by CPUs with a different stepping level. Microcode updates are encrypted to prevent tampering and to enable validation.[13]

With the Pentium there are two layers of encryption and the precise details explicitly documented by Intel, instead being only known to fewer than ten employees.[14]

Microcode updates for Intel Atom, Nehalem and Sandy Bridge additionally contain an extra 520-byte header containing a 2048-bit RSA modulus with an exponent of 17 decimal.

Observed Intel microcode data-block lengths (in bytes)! Micro architecture !! Example processors !! Supplied length !! Functional length !! Suspected encoding
P6 2000 864; 872; 944; 1968 64-bit block cipher
Core PIII … 4048 3096
Netburst ,, Celeron 2000–7120 2000 + N*1024 chained block cipher
Atom, Nehalem, 976–16336 976 + N*1024; 5120 AES + RSA signature

Debugging

Special debugging-specific microcode can be loaded to enable Extended Execution Trace, which then outputs extra information via the Breakpoint Monitor Pins. On the Pentium 4, loading special microcode can give access to Microcode Extended Execution Trace mode. When using the JTAG Test Access Port (TAP), a pair of Breakpoint Control registers allow breaking on microcode addresses.

During the mid-1980s NEC and Intel had a long-running US federal court case about microcode copyright.[15] NEC had been acting as a second source for Intel 8086 CPUs with its NEC μPD8086, and held long-term patent and copyright cross-licensing agreements with Intel. In August 1982 Intel sued NEC for copyright infringement over the microcode implementation.[16] [17] NEC prevailed by demonstrating via cleanroom software engineering that the similarities in the implementation of microcode on its V20 and V30 processors was the result of the restrictions demanded by the architecture, rather than via copying.[15]

The Intel 386 can perform a built-in self-test of the microcode and programmable logic arrays, with the value of the self-test placed in the EAX register.[18] During the BIST, the microprogram counter is re-used to walk through all of the ROMs, with the results being collated via a network of multiple-input signature registers (MISRs) and linear-feedback shift registers.[19] On start up of the Intel 486, a hardware-controlled BIST runs for 220 clock cycles to check various arrays including the microcode ROM, after which control is transferred to the microcode for further self-testing of registers and computation units.[20] The Intel 486 microcode ROM has 250,000 transistors.[20]

AMD had a long-term contract to reuse Intel's 286, 386 and 486 microcode. In October 2004, a court ruled that the agreement did not cover AMD distributing Intel's 486 in-circuit emulation (ICE) microcode.

Direct Access Testing

Direct Access Testing (DAT) is included in Intel CPUs as part of the design for testing (DFT) and Design for Debug (DFD) initiatives allow full coverage testing of individual CPUs prior to sale.[21]

In May 2020, a script reading directly from the Control Register Bus (CRBUS)[22] (after exploiting "Red Unlock" in JTAG USB-A to USB-A 3.0 with Debugging Capabilities, without D+, D− and Vcc) was used to read from the Local Direct Access Test (LDAT) port of the Intel Goldmont CPU and the loaded microcode and patch arrays were read.[23] These arrays are only accessible after the CPU has been put into a specific mode, and consist of five arrays accessed through offset 0x6a0:[24]

Further reading

External links

Notes and References

  1. An Overview of Advanced Failure Analysis Techniques for Pentium and Pentium Pro Microprocessors. Intel Technology Journal. 20 April 1998. Q2. Lin. Chao. Yeoh Eng Hong. Lim Seong Leong. Wong Yik Choong. Lock Choon Hou. Mahmud Adnan. Pentium Pro microprocessor ... Micropatching feature. ... consists of two key elements: the microcode patch RAM and several pairs of Match and Destination registers. ... Microcode Instruction Pointer (UIP) matches the content of a Match register, the UIP will be reloaded with a new address from the Destination register. ... for the reset subroutine can be set in the Match register ... thereby bypassing the reset subroutine altogether..
  2. Intel Corporation v. Advanced Micro Devices. Patricia V.. Trumbull. 1994-10-07. 2021-05-10. United States District Court for the Northern District of California. San Jose. Advanced Micro Devices. C-93-20301 PVT. Findings of fact and conclusions of law following "ICE" module of trial. Twelve pins are affiliated with the "ICE" circuitry. … AMD 486DXL and DXLV connect three pins associated with "ICE" in order to implement its "SMM" feature. … 250 lines or 12,032 bits of the "ICE" microcode in the 486. "ICE" constitutes about five percent of the total 486 microcode. … two lines … (used to set the "ICE" mode "flip flop") … blue coded lines of microcode are associated with production testing and not used for "ICE" related purposes. … Seventy-five red coded lines were used by Intel to perform "SMM" in its 486SL, a data sheet function of this version of the chip. About 32 yellow coded lines perform routine operations which are not unique to "ICE." About two lines remain dedicated solely to "ICE.".
  3. Web site: A Tour of the Pentium Pro Processor Microarchitecture. Intel. https://web.archive.org/web/19961220080210/http://www.intel.com/procs/ppro/info/p6white/index.htm. 1996-12-20. dead.
  4. Dynamic Scheduling in P6 (Pentium Pro, II, III). Low Power Design, Advanced Intel Processors. CS152 Computer Architecture and Engineering. Lecture 25. 3 May 2004. John. Kubiatowicz. Complex 80x86 instructions are executed by a conventional microprogram (x 72 bits) that issues long sequences of micro-operations.
  5. News: Intel's P6 Uses Decoupled Superscalar Design. https://web.archive.org/web/20181008134943/https://pdfs.semanticscholar.org/fe2b/b73d7046a6ed87ce9b18d62f194d67fa2100.pdf. dead. 8 October 2018. Linley. Gwennap. 16 February 1995. Microprocessor Report. MicroDesign Resources. 9. 2. 1–7. 14414612. P6 uops have a fixed length of 118 bits, using a regular structure to encode an operation, two sources, and a destination. The source and destination fields are each wide enough to contain a 32-bit operand..
  6. Web site: A 0.6 μm BiCMOS Processor With Dynamic Execution. Robert P.. Colwell. Randy L.. Steck. Intel Corporation. 1995-04-12. 2020-05-27. 7. Micro-ops are the atomic unit of work in the P6 processor and are an opcode, two source and one destination operand. These micro-ops are fixed length and are more general than the Pentium(R) processor's microcode since they need to be scheduled..
  7. unfit. [x 16.6.1. Simple Error Codes]. http://folk.uio.no/inf242/doc/242692_1.pdf. 6 September 2001. 401. 3 January 1996. 1 October 2018. Machine Check Architecture. 3: Operating System Writer's Guide. Pentium® Pro Family Developer's Manual. unique codes indicate global error information … Microcode ROM Parity Error. December 1995.
  8. News: Tuning the Pentium Pro Microarchitecture. 14. IEEE Micro. David B.. Papworth. Intel Corporation. 0272-1732. April 1996. 8 October 2018. B0 stepping incorporated several microcode bugs and speed path fixes for problems discovered on the A-step silicon. 8 October 2018. https://web.archive.org/web/20181008095801/http://web.cecs.pdx.edu/~berkina/R10_papworth_ieeemicro_1996.pdf. dead.
  9. Web site: The microarchitecture of Intel, AMD and VIA CPUs. An optimization guide for assembly programmers and compiler makers. Agner. Fog. Technical University of Denmark. 2020-05-25. 49. … If a μop has an immediate 32-bit operand outside the ±215 interval so that it cannot be represented as a 16-bit signed integer, then it will use two trace cache entries unless it can borrow storage space from a nearby μop. … A μop in need of extra storage space can borrow 16 bits of extra storage space from a nearby μop that doesn't need its own data space..
  10. Validating The Intel® Pentium® 4 Processor. Bug Discussion. Bob. Bentley. Rand. Gray. 29–26. Intel Technology Journal. Q1. 2001. Lin. Chao.
  11. 8: Pentium Pro Processor BIOS Update Feature. 2.0. 12 January 1996. Intel . 3 November 2020. 45. authentication procedure relies upon the decryption provided by the processor to verify an update from a potentially hostile sources..
  12. Book: Upgrading and Repairing PCs. Tenth Anniversary. Scott. Mueller. Craig. Zacker. September 1998. 0-7897-1636-4. Que Publishing. Jim. Minatel. Jill. Byus. Rick. Kughen. 79. Processor Steppings (Revisions) and Microcode Update Revisions Supported by the Update Database File PEP6.PDB … Using the processor update utility (CHECKUP3.EXE), … can easily verify … the correct microcode update. 1 October 2018.
  13. Intel preps plan to bust bugs in Pentium MPUs. dead. https://web.archive.org/web/19991113012445/http://www.techweb.com/se/directlink.cgi?EET19970630S0007. 1999-11-13. Alexander. Wolfe. EE Times. Techweb. 3 October 2018. 30 June 1997. 960. obscure moniker "BIOS Update Feature." … "Each BIOS Update is tailored for a particular stepping of [a] processor," … data block is mapped directly-… after decryption-to the microcode itself..
  14. Hole seen in Intel's bug-busting feature. https://web.archive.org/web/20030309102752/http://www.eetimes.com/news/97/963news/hole.html. 2003-03-09. EE Times. 30 June 1997. Alexander. Wolfe. Santa Clara. Ajay Malhortra, a technical marketing manager based here at Intel's microprocessor group. "Not only is the data block containing the microcode patch encrypted, but once the processor examines the header of the BIOS update, there are two levels of encryption in the processor that must occur before it will successfully load the update." … closely guarded secret. "There is no documentation," said Frank Binns, an architect in Intel's microprocessor group. "It's not as if you can get an Intel 'Red Book' with this stuff written down. It's actually in the heads of less than 10 people in the whole of Intel.".
  15. NEC v. Intel: A Guide to Using "Clean Room" Procedures as Evidence. Computer/Law Journal. 10. 4. Winter 1990. David S.. Elkins. 453. NEC's use of its clean room procedures as trial evidence … Judge Gray defined microcode … within the Copyright Act's definition of a "computer program," … Intel's microcode is copyrightable. … Intel's microcode did not contain the required copyright notice. … copyrights had been forfeited. … Intel was left with no basis for its claim of copying.
  16. Hinckley . Robert C. . January 1987 . NEC v. Intel: Will Hardware Be Drawn into the Black Hole of Copyright Editors' . Santa Clara High Technology Law Journal . 3 . 1 . Appendix: Microcode formats; 8086/8088 Format; V20/V30 format.
  17. Intel witness recants story. Kathy Chin. Leong. Computerworld. 83, 84. San Jose. 28 March 1988. 2 October 2018. 22. 13. 0010-4841.
  18. Web site: [x Intel386 DX Microprocessor 32-BIT CHMOS Microprocessor with Integrated Memory Management]. unfit. http://pdf.datasheetcatalog.com/datasheet/Intel/mXtuvqv.pdf. 3 September 2004. December 1995. 231630–011. self-test checks the function of all of the Control ROM … EAX register will contain a signature of 00000000h indicating the Intel386 DX passed its self-test of microcode and major PLA contents.
  19. 5.1 Exhaustive Test in the Intel 80386. Testing of Embedded System. Built-In-Self-Test (BIST) for Embedded Systems. 21. IIT Kharagpur. 7 October 2006. 6 October 2018. For ROMs, the patterns are generated by the microprogram counter which is part of the normal logic..
  20. Computer Aided Design and Built In Self Test on the i486™ CPU. Patrick. Gelsinger. Pat Gelsinger. Sundar. lyengar. Joseph. Krauskopf. James. Nadir. Intel. IEEE. 1999. 200–201. 1989 IEEE International Conference on Computer Design: VLSI in Computers and Processors.
  21. Web site: Wu. David M.. Lin. Mike. Reddy. Madhukar. Jaber. Talal. Sabbavarapu. Anil. Thatcher. Larry. Intel Corporation. 2004. An An optimized DFT and test pattern generation strategy for an Intel high performance microprocessor. 38, 43, 44. Direct Access Testing (DAT) for array access and diagnosis and Programmable Weak Write Test Mode (PWWTM) for memory cell stability test to reduce the test time. … Array test strategy is to use PBIST (Programmable Built-In Self Test) to test the second level cache and use DAT to test the remaining arrays … PBIST is available through the JTAG TAP controller. … DAT mode in PX as shown in Figure 4 … PX has more arrays (>110) … array test coverage of PX is 99.3% ‒ the highest in Pentium 4 family.
  22. Web site: Team . uCode Research . chip-red-pill/crbus_scripts . . 26 May 2020 . 25 May 2020.
  23. 1262697756805795841. _markel___. Mark. Ermolov. 2020-05-19. Using the Local Direct Access Test (LDAT) DFT feature of Intel Atom CPU, we dumped Microcode Sequencer ROM. Also, we extracted what we think is IROM (Immediates for uops) and even managed to modify MS Patch RAM and Match/Patch registers.
  24. Web site: Intel LDAT notes. 2020-05-22. 2020-05-26. Peter. Bosch. PDAT CR: 0x6A0; Array Select: 0‒4.