VEX prefix explained

The VEX prefix (from "vector extensions") and VEX coding scheme are an extension to the IA-32 and x86-64 instruction set architecture for microprocessors from Intel, AMD and others.

Features

The VEX coding scheme allows the definition of new instructions and the extension or modification of previously existing instruction codes. This serves the following purposes:

The VEX prefix replaces the most commonly used instruction prefix bytes and escape bytes. In many cases, the number of prefix bytes and escape bytes that are replaced is the same as the number of bytes in the VEX prefix, so that the total length of the VEX-encoded instruction is the same as the length of the legacy instruction code. In other cases, the VEX-encoded version is longer or shorter than the legacy code. In 32-bit mode VEX encoded instructions can only access the first 8 YMM/XMM registers; the encodings for the other registers would be interpreted as the legacy LDS and LES instructions that are not supported in 64-bit mode.

Instruction encoding

Intel 64 instruction format using VEX prefix
  1. of bytes
0, 2, 3 1 1 0, 1 0, 1, 2, 4 0, 1
Part[Prefixes] [VEX] OPCODE ModR/M [SIB] [DISP] [IMM]

The VEX coding scheme uses a code prefix consisting of two or three bytes, which may be added to existing or new instruction codes.[1]

In x86 architecture, instructions with a memory operand may use the ModR/M byte which specifies the addressing mode. This byte has three bit fields:

The base-plus-index and scale-plus-index forms of 32-bit addressing (encoded with r/m = 100 and mod ≠ 11) require another addressing byte, the SIB byte. It has the following fields:

REX and VEX encoding
ByteBit
REX
7 6 5 4 3 2 1 0
0 (0x4_)0 1 0 0 W R X B
VEX3 (3-byte VEX)
7 6 5 4 3 2 1 0
0 (0xC4)1 1 0 0 0 1 0 0
1 m4 m3 m2 m1 m0
2 W 3 2 1 0 L p1 p0
VEX2 (2-byte VEX)
7 6 5 4 3 2 1 0
0 (0xC5)1 1 0 0 0 1 0 1
1 3 2 1 0 L p1 p0
REX2 (2-byte REX)
7 6 5 4 3 2 1 0
0 (0xD5)1 1 0 1 0 1 0 1
1 M0R4 X4 B4 W R3 X3 B3

The REX prefix provides additional space for encoding 64-bit addressing modes and additional registers present in the x86-64 architecture. Bit-field W changes the operand size to 64 bits, R expands reg to 4 bits, B expands r/m (or opreg in the few opcodes that encode the register in the 3 lowest opcode bits, such as "POP reg"), and X and B expand index and base in the SIB byte.

The VEX3 prefix contains all bit-fields from the REX prefix as well as various other prefixes, expanding addressing mode, register enumeration, operand size and width:

The VEX2 prefix is a 2-byte variant of the VEX3 prefix, that differs from the latter in the following points:

Instructions that require any of these bit-fields need to be encoded with the VEX3 prefix.

The REX2 prefix is a 2-byte variant of the REX prefix, introduced with Intel APX extensions which add 16 Extended GPR registers.

Register addressing in 64-bit mode using VEX prefix
Addressing mode Bit 3 Bits [2:0] Register type Common usage
REG VEX.R ModRM.reg General purpose, mask, vector Register operand
RM (if ModRM.mod = 11) VEX.B ModRM.r/m GPR, mask, vector Register operand
RM VEX.B ModRM.r/m GPR Register memory address
BASE VEX.B SIB.base GPR Base + index × scale memory address
INDEX VEX.X SIB.index GPR Base + index × scale memory address
VIDX VEX.X SIB.index Vector Base + vector index × scale memory address
NDS/NDD VEX.vvvv GPR, mask, vector Register operand
IS4 Imm8[7:4] Vector Register operand

Technical description

Instructions coded with the VEX prefix can have up to four variable operands (in registers or memory) and one constant operand (immediate value). Instructions that need more than three variable operands use immediate operand bits to specify a 4th register operand (IS4 above). At most one of the operands can be a memory operand; and at most one of the operands can be an immediate constant of 4 or 8 bits. The remaining operands are registers.

The AVX instruction set is the first instruction set extension to use the VEX coding scheme. The AVX instruction set uses VEX prefix only for instructions using the SIMD XMM registers.

However, the VEX coding scheme has been used for other instruction types as well in subsequent expansions of the instruction set. For example:

The VEX prefix's initial-byte values, 0xC4 and 0xC5, are the same as the opcodes of the LDS and LES instructions. Not supported in 64-bit mode, the ambiguity is resolved in 32-bit mode by exploiting the fact that a legal LDS or LES's ModR/M byte cannot specify a register source operand; i.e., be of the form 11xxxxxx. Various bit-fields in the VEX prefix's second byte are inverted to ensure that the byte is always of this form. Similarly, the REX prefix's one-byte form has the four high-order bits set to four, which replaces sixteen opcodes numbered 0x40–0x4F. Previously, those opcodes were individual INC and DEC instructions for the eight standard processor registers; x86-64 code must use ModR/M INC and DEC instructions.[5]

Legacy SIMD instructions with a VEX prefix added are equivalent to the same instructions without VEX prefix with the following differences:

Instructions that use the whole 256-bit YMM register should not be mixed with non-VEX instructions that leave the upper half of the register unchanged, for reasons of efficiency.[6] [7]

The VEX prefix is not supported in real mode and virtual-8086 mode (all instructions with the VEX prefix will cause #UD in these modes).

History

Notes and References

  1. Web site: Intel Advanced Vector Extensions Programming Reference . Intel Corporation . January 2009.
  2. Book: Intel® Xeon Phi™ Coprocessor Instruction Set Architecture Reference Manual . Sep 7, 2012. 73. https://web.archive.org/web/20210804022347/https://software.intel.com/content/dam/develop/external/us/en/documents/327364001en.pdf . Aug 4, 2021 . live . 327364-001 .
  3. Book: Intel ® Architecture Instruction Set Extensions and Future Features . Sep 2023. 103. https://web.archive.org/web/20231212013831/https://cdrdv2-public.intel.com/790021/architecture-instruction-set-extensions-programming-reference.pdf . Dec 12, 2023 . live . 314933-050.
  4. Intel, Software Developers Manual, order no. 325462-081, sep 2023, vol 2, section 2.7.11.3, p. 588. Archived on Dec 6, 2023
  5. Web site: Intel® 64 and IA-32 Architectures Developer's Manual: Vol. 2A. Intel Corporation. 2-8. 2016-09-01. 2021-09-13.
  6. Intel, Avoiding AVX-SSE Transition Penalties, 2011. Archived on 26 Oct 2023.
  7. Stack Overflow, Why is this SSE code 6 times slower without VZEROUPPER on Skylake?, December 2016. Archived on 6 Jul 2023.
  8. Web site: 128-Bit SSE5 Instruction Set . AMD Developer Central . 2009-06-02.
  9. Web site: AMD Fusion now pushed back to 2011 . November 14, 2008 . Joel . Hruska . Ars Technica.
  10. Web site: Intel Software Network . . 2008-04-05 . https://web.archive.org/web/20080407095317/http://softwareprojects.intel.com/avx/ . 2008-04-07 . dead .
  11. Web site: AMD and Intel incompatible - What to do? . AMD Developer Forums . 2012-08-10.
  12. Web site: AMD64 Architecture Programmer's Manual Volume 4: 128-Bit and 256-Bit Media Instructions . December 22, 2010 . AMD.
  13. Web site: Striking a balance . 2012-08-10 . Dave Christie, AMD Developer blogs . https://archive.today/20131109140737/http://developer.amd.com/2009/05/06/striking-a-balance/ . 2013-11-09 . dead .