Shelving buffer explained

A shelving buffer is a technique used in computer processors to increase the efficiency of superscalar processors. It allows for multiple instructions to be dispatched at once regardless of the data dependencies between those instructions. This allows for out-of-order execution to occur which increases the throughput of the microprocessor.

Background

A superscalar processor allows the execution of a number of instructions simultaneously in the core of the processor itself, although this behavior is not to be confused with a multi-processor system. Most modern processors are superscalar. In a superscalar processor multiple instructions are dispatched from the same thread. Multi-core processors contain multiple processors all executing separate threads.

Problems with data dependencies

Executing instructions in parallel (i.e. simultaneously) raises problems with data dependencies, meaning that some instructions may be dependent on the results of others, and hence care must be taken to execute in the correct order.

Take for example these sequence of instructions:

r1 = r2 + r3
r7 = r1 + r4

The update to r7 introduces a (Read After Write) data dependency. The first line of instructions must complete before the second begins execution, as r7 requires the correct value of r1 (register 1) to be known prior to execution. This type of instruction cannot be executed concurrently or simultaneously, the order-of-operations is implicitly serial.

How it works

With a superscalar processor, the instruction window of the processor fills up with a number of instructions (known as the issue rate). Depending on the scheme that the superscalar processor uses to dispatch these instruction from the window to the execution core of the CPU, there may be problems if there is a dependency not unlike the one shown above.

Consider an instruction window 3 instructions wide, containing i1, i2, i3 (instructions 1,2 & 3). Suppose that i2 is dependent on an instruction that has not yet finished executing, and it cannot be executed yet.

Without the use of a shelving buffer, the superscalar processor will execute i1, wait until i2 can be executed and then execute i2 and i3 simultaneously.

However, with the use of a shelving buffer, the instruction window will be emptied into shelving buffers regardless of contents. The processor will then search for an appropriate number of instructions in the shelving buffers that can be executed in parallel (i.e. with no dependencies).

Hence the processor has a greater chance of running the maximum number of instructions simultaneously, and maximising throughput.