Instructions that have at some point been present as documented instructions in one or more x86 processors, but where the processor series containing the instructions are discontinued or superseded, with no known plans to reintroduce the instructions.
The following instructions were introduced in the Intel 80386, but later discontinued:
Instruction | Opcode | Description | Eventual fate |
---|---|---|---|
Extract Bit String | Discontinued from revision B1 of the 80386 onwards. Opcodes briefly reused for Opcodes later reused for VIA PadLock. | ||
IBTS r/m, r | 0F A7 /r | Insert Bit String | |
0F 24 /r | Move from test register | Present in Intel 386 and 486 − not present in Intel Pentium or any later Intel CPUs (except they're present in the i486-derived Quark X1000). Present in all Cyrix CPUs. | |
MOV TRx,r32 | 0F 26 /r | Move to test register |
These instructions are only present in the x86 operation mode of early Intel Itanium processors with hardware support for x86. This support was added in "Merced" and removed in "Montecito", replaced with software emulation.
See main article: Intel MPX.
These instructions were introduced in 6th generation Intel Core "Skylake" CPUs. The last CPU generation to support them was the 9th generation Core "Coffee Lake" CPUs.
Intel MPX adds 4 new registers, BND0 to BND3, that each contains a pair of addresses. MPX also defines a bounds-table as a 2-level directory/table data structure in memory that contains sets of upper/lower bounds.
Instruction | Opcode | Description | |
---|---|---|---|
BNDMK b, m | Make lower and upper bound from memory address expression. The lower bound is given by base component of address, the upper bound by 1-s complement of the address as a whole. | ||
BNDCL b, r/m | F3 0F 1A /r | Check address against lower bound.
| |
BNDCU b, r/m | F2 0F 1A /r | Check address against upper bound in 1's-complement form | |
BNDCN b, r/m | F2 0F 1B /r | Check address against upper bound. | |
66 0F 1A /r | Move a pair of memory bounds to/from memory or between bounds-registers. | ||
BNDMOV b/m, b | |||
BNDLDX b,mib | NP 0F 1A /r | Load bounds from the bounds-table, using address translation using an sib-addressing expression mib. | |
BNDSTX mib,b | NP 0F 1B /r | Store bounds into the bounds-table, using address translation using an sib-addressing expression mib. | |
BND | F2 | Instruction prefix used with certain branch instructions to indicate that they should not clear the bounds registers. |
The Hardware Lock Elision feature of Intel TSX is marked in the Intel SDM as removed from 2019 onwards.[2] This feature took the form of two instruction prefixes, XACQUIRE
and XRELEASE
, that could be attached to memory atomics/stores to elide the memory locking that they represent.
Instruction prefix | Opcode | Description | |
---|---|---|---|
XACQUIRE | F2 | Instruction prefix to indicate start of hardware lock elision, used with memory atomic instructions only (for other instructions, the F2 prefix may have other meanings). When used with such instructions, may start a transaction instead of performing the memory atomic operation. | |
XRELEASE | F3 | Instruction prefix to indicate end of hardware lock elision, used with memory atomic/store instructions only (for other instructions, the F3 prefix may have other meanings). When used with such instructions during hardware lock elision, will end the associated transaction instead of performing the store/atomic. |
The VP2INTERSECT instructions (an AVX-512 subset) were introduced in Tiger Lake (11th generation mobile Core processors), but were never officially supported on any other Intel processors - they are now considered deprecated[3] and are listed in the Intel SDM as removed from 2023 onwards.
As of July 2024, the VP2INTERSECT instructions have been re-introduced on AMD Zen 5 processors.[4]
Instruction | Opcode | Description | |
---|---|---|---|
VP2INTERSECTD k1+1, ymm2, ymm3/m256/m32bcst VP2INTERSECTD k1+1, zmm2, zmm3/m512/m32bcst | Store, in an even/odd pair of mask registers, the indicators of the locations of value matches between 32-bit lanes in the two vector source arguments. | ||
VP2INTERSECTQ k1+1, ymm2, ymm3/m256/m64bcst VP2INTERSECTQ k1+1, zmm2, zmm3/m512/m64bcst | Store, in an even/odd pair of mask registers, the indicators of the locations of value matches between 64-bit lanes in the two vector source arguments. |
The first generation Xeon Phi processors, codenamed "Knights Corner" (KNC), supported a large number of instructions that are not seen in any later x86 processor. An instruction reference is available[5] − the instructions/opcodes unique to KNC are the ones with VEX and MVEX prefixes (except for the KMOV
, KNOT
and KORTEST
instructions − these are kept with the same opcodes and function in AVX-512, but with an added "W" appended to their instruction names).
Most of these KNC-unique instructions are similar but not identical to instructions in AVX-512 − later Xeon Phi processors replaced these instructions with AVX-512.
Early versions of AVX-512 avoided the instruction encodings used by KNC's MVEX prefix, however with the introduction of Intel APX (Advanced Performance Extensions) in 2023, some of the old KNC MVEX instruction encodings have been reused for new APX encodings. For example, both KNC and APX accept the instruction encoding as valid, but assign different meanings to it:
Some of the AVX-512 instructions in the Xeon Phi "Knights Landing" and later models belong to the AVX-512 subsets "AVX512ER", "AVX512_4FMAPS", "AVX512PF" and "AVX512_4VNNIW", all of which are unique to the Xeon Phi series of processors. The ER and PF subsets were introduced in "Knights Landing" − the 4FMAPS and 4VNNIW instructions were later added in "Knights Mill".The ER and 4FMAPS instructions are floating-point arithmetic instructions that all follow a given pattern where:
Operation | AVX-512 subset | Basic opcode | FP32 instructions (W=0) | FP64 instructions (W=1) | RC/SAE | |||||
---|---|---|---|---|---|---|---|---|---|---|
Packed | Scalar | Packed | Scalar | |||||||
Xeon Phi specific instructions (ER, 4FMAPS) | ||||||||||
Reciprocal approximation with an accuracy of 2-28 | ER | SAE | ||||||||
Reciprocal square root approximation with an accuracy of 2-28 | ER | SAE | ||||||||
Exponential 2x 2-23 | ER | EVEX.66.0F38 C8 /r | VEXP2PS z,z/m512 | VEXP2PD z,z/m512 | SAE | |||||
Fused-multiply-add, 4 iterations | 4FMAPS | |||||||||
Fused negate-multiply-add, 4 iterations | 4FMAPS |
Operation | Basic opcode | 32-bit indexes (opcode C6 ) | 64-bit indexes (opcode C7 ) | |||
---|---|---|---|---|---|---|
FP32 prefetch (W=0) | FP64 prefetch (W=1) | FP32 prefetch (W=0) | FP64 prefetch (W=1) | |||
Prefetch into L1 cache (T0 hint) | ||||||
Prefetch into L2 cache (T1 hint) | ||||||
Prefetch into L1 cache (T0 hint) with intent to write | ||||||
Prefetch into L2 cache (T1 hint) with intent to write |
Instruction | Opcode | Description | |
---|---|---|---|
Dot-product of signed words with dword accumulation, 4 iterations | |||
Dot-product of signed words with dword accumulation and saturation, 4 iterations |
PREFETCHWT1 m8
instruction (opcode 0F 0D /2
, prefetch into L2 cache with intent to write) − these were the only Intel CPUs to officially support this instruction, but it continues to be supported on some non-Intel processors (e.g. Zhaoxin YongFeng).A handful of instructions to support System Management Mode were introduced in the Am386SXLV and Am386DXLV processors.[6] They were also present in the later Am486SXLV/DXLV and Elan SC300/310 processors.[7]
The SMM functionality of these processors was implemented using Intel ICE microcode without a valid license, resulting in a lawsuit that AMD lost in late 1994.[8] As a result of this loss, the ICE microcode was removed from all later AMD CPUs, and the SMM instructions removed with it.
Instruction | Opcode | Description | |
---|---|---|---|
SMI | F1 | Call SMM interrupt handler (only if DR7 bit 12 is set; not available on Am486SXLV/DXLV[9]) | |
UMOV r/m8, r8 | 0F 10 /r | Move data between registers and main system memory | |
UMOV r/m, r16/32 | 0F 11 /r | ||
UMOV r8, r/m8 | 0F 12 /r | ||
RES3 | 0F 07 | Return from SMM interrupt handler (Am386SXLV/DXLV only) Takes a pointer in ES:EDI to a processor save state to resume from − this save state has format nearly identical to that of the undocumented Intel 386 [[LOADALL]] instruction.[10] | |
RES4 | 0F 07 | Return from SMM interrupt handler (Am486SXLV/DXLV only). Similar to RES3 , but with a different save state format.[11] |
These SMM instructions were also present on the IBM 386SLC and its derivatives (albeit with the [[LOADALL]]
-like SMM return opcode 0F 07
named ICERET
),[12] as well as on the UMC U5S processor.[13]
See main article: 3DNow!.
The 3DNow! instruction set extension was introduced in the AMD K6-2, mainly adding support for floating-point SIMD instructions using the MMX registers (two FP32 components in a 64-bit vector register). The instructions were mainly promoted by AMD, but were supported on some non-AMD CPUs as well. The processors supporting 3DNow! were:
Instruction | Opcode | Instruction description | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
PFADD mm1,mm2/m64 | 0F 0F /r 9E | Packed floating-point addition:dst <- dst + src | ||||||||
PFSUB mm1,mm2/m64 | 0F 0F /r 9A | Packed floating-point subtraction:dst <- dst − src | ||||||||
PFSUBR mm1,mm2/m64 | 0F 0F /r AA | Packed floating-point reverse subtraction:dst <- src − dst | ||||||||
PFMUL mm1,mm2/m64 | 0F 0F /r B4 | Packed floating-point multiplication:dst <- dst * src | ||||||||
PFMAX mm1,mm2/m64 | 0F 0F /r A4 | Packed floating-point maximum:dst <- (dst > src) ? dst : src | ||||||||
PFMIN mm1,mm2/m64 | 0F 0F /r 94 | Packed floating-point minimum:dst <- (dst < src) ? dst : src | ||||||||
PFCMPEQ mm1,mm2/m64 | 0F 0F /r B0 | Packed floating-point comparison, equal:dst <- (dst == src) ? 0xFFFFFFFF : 0 | ||||||||
PFCMPGE mm1,mm2/m64 | 0F 0F /r 90 | Packed floating-point comparison, greater than or equal:dst <- (dst >= src) ? 0xFFFFFFFF : 0 | ||||||||
PFCMPGT mm1,mm2/m64 | 0F 0F /r A0 | Packed floating-point comparison, greater than:dst <- (dst > src) ? 0xFFFFFFFF : 0 | ||||||||
PF2ID mm1,mm2/m64 | 0F 0F /r 1D | Converts packed floating-point operand to packed 32-bit signed integer, with round-to-zero | ||||||||
PI2FD mm1,mm2/m64 | Packed 32-bit signed integer to floating-point conversion, with round-to-zero | |||||||||
PFRCP mm1,mm2/m64 | 0F 0F /r 96 | Floating-point reciprocal approximation (at least 14 bit precision):temp <- approx(1.0/src[31:0])<br/>dst[31:0] <- temp<br/>dst[63:32] <- temp | The 3DNow | specification[14] does not directly specify the operation performed by the PFRCPIT1 , PFRSQIT1 and PFRCPIT2 instructions − instead, it imposes requirements on the results of using these instructions together in specific ways:If the bottom 32 bits of PFRCP mm1,mm0 PFRCPIT1 mm0,mm1 PFRCPIT2 mm0,mm1must fill both 32-bit lanes of mm0 with
Similarly, the instruction sequence: PFRSQRT mm1,mm0 MOVQ mm2,mm1 PFMUL mm1,mm1 PFRSQIT1 mm1,mm0 PFRCPIT2 mm1,mm2must fill both 32-bit lanes of mm1 with
| ||||||
PFRSQRT mm1,mm2/m64 | 0F 0F /r 97 | Floating-point reciprocal square root approximation (at least 15 bit precision):temp <- approx(1.0/sqrt(src[31:0]))<br/>dst[31:0] <- temp<br/>dst[63:32] <- temp | ||||||||
0F 0F /r A6 | Packed floating-point reciprocal, first iteration step | |||||||||
0F 0F /r A7 | Packed floating-point reciprocal square root, first iteration step | |||||||||
0F 0F /r B6 | Packed floating-point reciprocal/reciprocal square root, second iteration step | |||||||||
PFACC mm1,mm2/m64 | 0F 0F /r AE | Floating-point accumulate (horizontal add):dst[31:0] <- dst[31:0] + dst[63:32]<br/>dst[63:32] <- src[31:0] + src[63:32] | ||||||||
0F 0F /r B7 | Multiply signed packed 16-bit integers with rounding and store the high 16 bits:dst <- ((dst * src) + 0x8000) >> 16 | |||||||||
PAVGUSB mm1,mm2/m64 | 0F 0F /r BF | Average of unsigned packed 8-bit integers:dst <- (src+dst+1) >> 1 | ||||||||
FEMMS | 0F 0E | Faster Enter/Exit of the MMX or x87 floating-point state |
PREFETCHW
instruction is also supported on Intel CPUs starting with Pentium 4,[15] albeit executed as NOP until Broadwell.Instruction | Opcode | Instruction description |
---|---|---|
PF2IW mm1,mm2/m64 | 0F 0F /r 1C | Packed 32-bit floating-point to 16-bit signed integer conversion, with round-to-zero |
PI2FW mm1,mm2/m64 | 0F 0F /r 0C | Packed 16-bit signed integer to 32-bit floating-point conversion |
PSWAPD mm1,mm2/m64 |