Stack register explained

A stack register is a computer central processor register whose purpose is to keep track of a call stack. On an accumulator-based architecture machine, this may be a dedicated register. On a machine with multiple general-purpose registers, it may be a register that is reserved by convention, such as on the IBM System/360 through z/Architecture architecture and RISC architectures, or it may be a register that procedure call and return instructions are hardwired to use, such as on the PDP-11, VAX, and Intel x86 architectures. Some designs such as the Data General Eclipse had no dedicated register, but used a reserved hardware memory address for this function.

Machines before the late 1960s - such as the PDP-8 and HP 2100 - did not have compilers which supported recursion. Their subroutine instructions typically would save the current location in the jump address, and then set the program counter to the next address. While this is simpler than maintaining a stack, since there is only one return location per subroutine code section, there cannot be recursion without considerable effort on the part of the programmer.

A stack machine has 2 or more stack registers - one of them keeps track of a call stack, the other(s) keep track of other stack(s).

Stack registers in x86

In 8086, the main stack register is called "stack pointer" (SP). The stack segment register (SS) is usually used to store information about the memory segment that stores the call stack of currently executed program. SP points to current stack top. By default, the stack grows downward in memory, so newer values are placed at lower memory addresses. To save a value to the stack, the PUSH instruction is used. To retrieve a value from the stack, the POP instruction is used.

Example: Assuming that SS = 1000h and SP = 0xF820. This means that current stack top is the physical address 0x1F820 (this is due to memory segmentation in 8086). The next two machine instructions of the program are:

PUSH AXPUSH BX

This illustrates how PUSH works. Usually, the running program pushes registers to the stack to make use of the registers for other purposes, like to call a routine that may change the current values of registers. To restore the values stored at the stack, the program shall contain machine instructions like this:

POP BXPOP AX

Stack engine

Simpler processors store the stack pointer in a regular hardware register and use the arithmetic logic unit (ALU) to manipulate its value. Typically push and pop are translated into multiple micro-ops, to separately add/subtract the stack pointer, and perform the load/store in memory.

Newer processors contain a dedicated stack engine to optimize stack operations. Pentium M was the first x86 processor to introduce a stack engine. In its implementation, the stack pointer is split among two registers: ESPO, which is a 32-bit register, and ESPd, an 8-bit delta value that is updated directly by stack operations. PUSH, POP, CALL and RET opcodes operate directly with the ESPd register. If ESPd is near overflow or the ESP register is referenced from other instructions (when ESPd ≠ 0), a synchronisation micro-op is inserted that updates the ESPO using the ALU and resets ESPd to 0. This design has remained largely unmodified in later Intel processors, although ESPO has been expanded to 64 bits.

A stack engine similar to Intel's was also adopted in the AMD K8 microarchitecture. In Bulldozer, the need for synchronization micro-ops was removed, but the internal design of the stack engine is not known.