List of discontinued x86 instructions explained

Instructions that have at some point been present as documented instructions in one or more x86 processors, but where the processor series containing the instructions are discontinued or superseded, with no known plans to reintroduce the instructions.

Intel instructions

i386 instructions

The following instructions were introduced in the Intel 80386, but later discontinued:

Instruction	Opcode	Description	Eventual fate
		Extract Bit String	Discontinued from revision B1 of the 80386 onwards. Opcodes briefly reused for `CMPXCHG` in Intel 486 stepping A only − `CMPXCHG` was moved to different opcode from 486 stepping B onwards. Opcodes later reused for VIA PadLock.
`IBTS r/m, r`	`0F A7 /r`	Insert Bit String
	`0F 24 /r`	Move from test register	Present in Intel 386 and 486 − not present in Intel Pentium or any later Intel CPUs (except they're present in the i486-derived Quark X1000). Present in all Cyrix CPUs.
`MOV TRx,r32`	`0F 26 /r`	Move to test register

Itanium instructions

These instructions are only present in the x86 operation mode of early Intel Itanium processors with hardware support for x86. This support was added in "Merced" and removed in "Montecito", replaced with software emulation.

MPX instructions

See main article: Intel MPX.

These instructions were introduced in 6th generation Intel Core "Skylake" CPUs. The last CPU generation to support them was the 9th generation Core "Coffee Lake" CPUs.

Intel MPX adds 4 new registers, BND0 to BND3, that each contains a pair of addresses. MPX also defines a bounds-table as a 2-level directory/table data structure in memory that contains sets of upper/lower bounds.

Instruction	Opcode	Description
`BNDMK b, m`		Make lower and upper bound from memory address expression. The lower bound is given by base component of address, the upper bound by 1-s complement of the address as a whole.
`BNDCL b, r/m`	`F3 0F 1A /r`	Check address against lower bound. `BNDCL`, `BNDCU` and `BNDCL` all produce a #BR exception if the bounds check fails.
`BNDCU b, r/m`	`F2 0F 1A /r`	Check address against upper bound in 1's-complement form
`BNDCN b, r/m`	`F2 0F 1B /r`	Check address against upper bound.
	`66 0F 1A /r`	Move a pair of memory bounds to/from memory or between bounds-registers.
`BNDMOV b/m, b`
`BNDLDX b,mib`	`NP 0F 1A /r`	Load bounds from the bounds-table, using address translation using an sib-addressing expression mib.
`BNDSTX mib,b`	`NP 0F 1B /r`	Store bounds into the bounds-table, using address translation using an sib-addressing expression mib.
`BND`	`F2`	Instruction prefix used with certain branch instructions to indicate that they should not clear the bounds registers.

Hardware Lock Elision

The Hardware Lock Elision feature of Intel TSX is marked in the Intel SDM as removed from 2019 onwards.^[2] This feature took the form of two instruction prefixes, XACQUIRE and XRELEASE, that could be attached to memory atomics/stores to elide the memory locking that they represent.

Instruction prefix	Opcode	Description
`XACQUIRE`	`F2`	Instruction prefix to indicate start of hardware lock elision, used with memory atomic instructions only (for other instructions, the `F2` prefix may have other meanings). When used with such instructions, may start a transaction instead of performing the memory atomic operation.
`XRELEASE`	`F3`	Instruction prefix to indicate end of hardware lock elision, used with memory atomic/store instructions only (for other instructions, the `F3` prefix may have other meanings). When used with such instructions during hardware lock elision, will end the associated transaction instead of performing the store/atomic.

VP2Intersect instructions

The VP2INTERSECT instructions (an AVX-512 subset) were introduced in Tiger Lake (11th generation mobile Core processors), but were never officially supported on any other Intel processors - they are now considered deprecated^[3] and are listed in the Intel SDM as removed from 2023 onwards.

As of July 2024, the VP2INTERSECT instructions have been re-introduced on AMD Zen 5 processors.^[4]

Instruction	Opcode	Description
`VP2INTERSECTD k1+1, ymm2, ymm3/m256/m32bcst` `VP2INTERSECTD k1+1, zmm2, zmm3/m512/m32bcst`		Store, in an even/odd pair of mask registers, the indicators of the locations of value matches between 32-bit lanes in the two vector source arguments.
`VP2INTERSECTQ k1+1, ymm2, ymm3/m256/m64bcst` `VP2INTERSECTQ k1+1, zmm2, zmm3/m512/m64bcst`		Store, in an even/odd pair of mask registers, the indicators of the locations of value matches between 64-bit lanes in the two vector source arguments.

Instructions specific to Xeon Phi processors

"Knights Corner" instructions

The first generation Xeon Phi processors, codenamed "Knights Corner" (KNC), supported a large number of instructions that are not seen in any later x86 processor. An instruction reference is available^[5] − the instructions/opcodes unique to KNC are the ones with VEX and MVEX prefixes (except for the KMOV, KNOT and KORTEST instructions − these are kept with the same opcodes and function in AVX-512, but with an added "W" appended to their instruction names).

Most of these KNC-unique instructions are similar but not identical to instructions in AVX-512 − later Xeon Phi processors replaced these instructions with AVX-512.

Early versions of AVX-512 avoided the instruction encodings used by KNC's MVEX prefix, however with the introduction of Intel APX (Advanced Performance Extensions) in 2023, some of the old KNC MVEX instruction encodings have been reused for new APX encodings. For example, both KNC and APX accept the instruction encoding as valid, but assign different meanings to it:

KNC: - vector load with data conversion
APX: - vector load with one of the new APX extended-GPRs used as scaled index

"Knights Landing" and "Knights Mill" instructions

Some of the AVX-512 instructions in the Xeon Phi "Knights Landing" and later models belong to the AVX-512 subsets "AVX512ER", "AVX512_4FMAPS", "AVX512PF" and "AVX512_4VNNIW", all of which are unique to the Xeon Phi series of processors. The ER and PF subsets were introduced in "Knights Landing" − the 4FMAPS and 4VNNIW instructions were later added in "Knights Mill".The ER and 4FMAPS instructions are floating-point arithmetic instructions that all follow a given pattern where:

EVEX.W is used to specify floating-point format (0=FP32, 1=FP64)
The bottom opcode bit is used to select between packed and scalar operation (0: packed, 1:scalar)
For a given operation, all the scalar/packed variants belong to the same AVX-512 subset.
The instructions all support result masking by opmask registers. The AVX512ER instructions also all support broadcast of memory operands.
The only supported vector width is 512 bits.

Operation	AVX-512 subset	Basic opcode	Packed	Packed	RC/SAE
Operation	AVX-512 subset	Basic opcode	FP32 instructions (W=0)	FP64 instructions (W=1)	RC/SAE
Xeon Phi specific instructions (ER, 4FMAPS)
Reciprocal approximation with an accuracy of 2^-28	ER				SAE
Reciprocal square root approximation with an accuracy of 2^-28	ER				SAE
Exponential 2^x approximation with 2^-23 relative error	ER	`EVEX.66.0F38 C8 /r`	`VEXP2PS z,z/m512`	`VEXP2PD z,z/m512`	SAE
Fused-multiply-add, 4 iterations	4FMAPS
Fused negate-multiply-add, 4 iterations	4FMAPS

The AVX512PF instructions are a set of 16 prefetch instructions. These instructions all use VSIB encoding, where a memory addressing mode using the SIB byte is required, and where the index part of the SIB byte is taken to index into the AVX512 vector register file rather than the GPR register file. The selected AVX512 vector register is then interpreted as a vector of indexes, causing the standard x86 base+index+displacement address calculation to be performed for each vector lane, causing one associated memory operation (prefetches in case of the AVX512PF instructions) to be performed for each active lane. The instruction encodings all follow a pattern where:

EVEX.W is used to specify format of the prefetchable data (0:FP32, 1:FP64)
The bottom bit of the opcode is used to indicate whether the AVX512 index register is considered a vector of sixteen signed 32-bit indexes (bit 0 not set) or eight signed 64-bit indexes (bit 0 set)
The instructions all support operation masking by opmask registers.
The only supported vector width is 512 bits.

Operation	Basic opcode	32-bit indexes (opcode `C6`)		64-bit indexes (opcode `C7`)
Operation	Basic opcode	FP32 prefetch (W=0)	FP64 prefetch (W=1)	FP32 prefetch (W=0)	FP64 prefetch (W=1)
Prefetch into L1 cache (T0 hint)
Prefetch into L2 cache (T1 hint)
Prefetch into L1 cache (T0 hint) with intent to write
Prefetch into L2 cache (T1 hint) with intent to write

The AVX512_4VNNIW instructions read a 128-bit data item from memory, containing 4 two-component vectors (each component being signed 16-bit). Then, for each of 4 consecutive AVX-512 registers, they will, for each 32-bit lane, interpret the lane as a two-component vector (signed 16-bit) and perform a dot-product with the corresponding two-component vector that was read from memory (the first two-component vector from memory is used for the first AVX-512 source register, and so on). These results are then accumulated into a destination vector register.

Instruction	Opcode	Description
		Dot-product of signed words with dword accumulation, 4 iterations
		Dot-product of signed words with dword accumulation and saturation, 4 iterations

Xeon Phi processors (from Knights Landing onwards) also featured the PREFETCHWT1 m8 instruction (opcode 0F 0D /2, prefetch into L2 cache with intent to write) − these were the only Intel CPUs to officially support this instruction, but it continues to be supported on some non-Intel processors (e.g. Zhaoxin YongFeng).

AMD instructions

Am386 SMM instructions

A handful of instructions to support System Management Mode were introduced in the Am386SXLV and Am386DXLV processors.^[6] They were also present in the later Am486SXLV/DXLV and Elan SC300/310 processors.^[7]

The SMM functionality of these processors was implemented using Intel ICE microcode without a valid license, resulting in a lawsuit that AMD lost in late 1994.^[8] As a result of this loss, the ICE microcode was removed from all later AMD CPUs, and the SMM instructions removed with it.

Instruction	Opcode	Description
`SMI`	`F1`	Call SMM interrupt handler (only if DR7 bit 12 is set; not available on Am486SXLV/DXLV^[9])
`UMOV r/m8, r8`	`0F 10 /r`	Move data between registers and main system memory
`UMOV r/m, r16/32`	`0F 11 /r`
`UMOV r8, r/m8`	`0F 12 /r`

`RES3`	`0F 07`	Return from SMM interrupt handler (Am386SXLV/DXLV only) Takes a pointer in ES:EDI to a processor save state to resume from − this save state has format nearly identical to that of the undocumented Intel 386 `[[LOADALL]]` instruction.^[10]
`RES4`	`0F 07`	Return from SMM interrupt handler (Am486SXLV/DXLV only). Similar to `RES3`, but with a different save state format.^[11]

These SMM instructions were also present on the IBM 386SLC and its derivatives (albeit with the [[LOADALL]]-like SMM return opcode 0F 07 named ICERET),^[12] as well as on the UMC U5S processor.^[13]

3DNow! instructions

See main article: 3DNow!.

The 3DNow! instruction set extension was introduced in the AMD K6-2, mainly adding support for floating-point SIMD instructions using the MMX registers (two FP32 components in a 64-bit vector register). The instructions were mainly promoted by AMD, but were supported on some non-AMD CPUs as well. The processors supporting 3DNow! were:

AMD K6-2, K6-III, and all processors based on the K7, K8 and K10 microarchitectures. (Later AMD microarchitectures such as Bulldozer, Bobcat and Zen do not support 3DNow!)
IDT WinChip 2 and 3
VIA Cyrix III (both "Joshua" and "Samuel" variants), and the "Samuel" and "Ezra" revisions of VIA C3. (Later VIA CPUs, from C3 "Nehemiah" onwards, dropped 3DNow! in favor of SSE.)
National Semiconductor Geode GX2; AMD Geode GX and LX.

Instruction Opcode Instruction description

PFADD mm1,mm2/m64 0F 0F /r 9E Packed floating-point addition:
dst <- dst + src

PFSUB mm1,mm2/m64 0F 0F /r 9A Packed floating-point subtraction:
dst <- dst − src

PFSUBR mm1,mm2/m64 0F 0F /r AA Packed floating-point reverse subtraction:
dst <- src − dst

PFMUL mm1,mm2/m64 0F 0F /r B4 Packed floating-point multiplication:
dst <- dst * src

PFMAX mm1,mm2/m64 0F 0F /r A4 Packed floating-point maximum:
dst <- (dst > src) ? dst : src

PFMIN mm1,mm2/m64 0F 0F /r 94 Packed floating-point minimum:
dst <- (dst < src) ? dst : src

PFCMPEQ mm1,mm2/m64 0F 0F /r B0 Packed floating-point comparison, equal:
dst <- (dst == src) ? 0xFFFFFFFF : 0

PFCMPGE mm1,mm2/m64 0F 0F /r 90 Packed floating-point comparison, greater than or equal:
dst <- (dst >= src) ? 0xFFFFFFFF : 0

PFCMPGT mm1,mm2/m64 0F 0F /r A0 Packed floating-point comparison, greater than:
dst <- (dst > src) ? 0xFFFFFFFF : 0

PF2ID mm1,mm2/m64 0F 0F /r 1D Converts packed floating-point operand to packed 32-bit signed integer, with round-to-zero

PI2FD mm1,mm2/m64 Packed 32-bit signed integer to floating-point conversion, with round-to-zero

PFRCP mm1,mm2/m64

0F 0F /r 96

Floating-point reciprocal approximation (at least 14 bit precision):
temp <- approx(1.0/src[31:0]) dst[31:0] <- temp dst[63:32] <- temp

The 3DNow

specification^[14] does not directly specify the operation performed by the PFRCPIT1, PFRSQIT1 and PFRCPIT2 instructions − instead, it imposes requirements on the results of using these instructions together in specific ways:

If the bottom 32 bits of mm0 initially contains a value X in FP32 format, then the instruction sequence:

PFRCP mm1,mm0
PFRCPIT1 mm0,mm1
PFRCPIT2 mm0,mm1

must fill both 32-bit lanes of mm0 with

	1.0
	X

in FP32 format, computed with an error of at most 1 ulp.

Similarly, the instruction sequence:

PFRSQRT mm1,mm0
MOVQ mm2,mm1
PFMUL mm1,mm1
PFRSQIT1 mm1,mm0
PFRCPIT2 mm1,mm2

must fill both 32-bit lanes of mm1 with

	1.0
	\sqrt{X

} in FP32 format, computed with an error of at most 1 ulp.

PFRSQRT mm1,mm2/m64 0F 0F /r 97 Floating-point reciprocal square root approximation (at least 15 bit precision):
temp <- approx(1.0/sqrt(src[31:0])) dst[31:0] <- temp dst[63:32] <- temp

0F 0F /r A6 Packed floating-point reciprocal, first iteration step

0F 0F /r A7 Packed floating-point reciprocal square root, first iteration step

0F 0F /r B6 Packed floating-point reciprocal/reciprocal square root, second iteration step

PFACC mm1,mm2/m64 0F 0F /r AE Floating-point accumulate (horizontal add):
dst[31:0] <- dst[31:0] + dst[63:32] dst[63:32] <- src[31:0] + src[63:32]

0F 0F /r B7 Multiply signed packed 16-bit integers with rounding and store the high 16 bits:
dst <- ((dst * src) + 0x8000) >> 16

PAVGUSB mm1,mm2/m64 0F 0F /r BF Average of unsigned packed 8-bit integers:
dst <- (src+dst+1) >> 1

FEMMS 0F 0E Faster Enter/Exit of the MMX or x87 floating-point state

3DNow! also introduced a couple of prefetch instructions: (opcode) and (opcode). These instructions, unlike the rest of 3DNow!, are not discontinued but continue to be supported on modern AMD CPUs. The PREFETCHW instruction is also supported on Intel CPUs starting with Pentium 4,^[15] albeit executed as NOP until Broadwell.

Instruction	Opcode	Instruction description
`PF2IW mm1,mm2/m64`	`0F 0F /r 1C`	Packed 32-bit floating-point to 16-bit signed integer conversion, with round-to-zero
`PI2FW mm1,mm2/m64`	`0F 0F /r 0C`	Packed 16-bit signed integer to 32-bit floating-point conversion
`PSWAPD mm1,mm2/m64`

Notes and References

Intel Itanium Architecture Software Developer's Manual, volume 4, (document number: 323208, revision 2.3, May 2010).
Intel SDM, volume 1, order no. 253665-083, mar 2024, chapter 2.5
R. Singhal, Yes. Deprecated. (about VP2INTERSECT), Jul 19, 2023. Archived on Jul 23, 2023.
Alexander Yee, Zen5's AVX512 Teardown + More, 7 Aug 2024
Intel, Intel® Xeon Phi™ Coprocessor Instruction Set Architecture Reference Manual, sep 2012, order no. 327364-001. Archived on 4 Aug 2021.
Web site: Am386®SX/SXL/SXLV High-Performance, Low-Power, Embedded Microprocessors., publication #21020, rev A, apr 1997 − has SMM instruction descriptions on pages 5 and 6.
AMD, Élan™SC310 Microcontroller Programmer’s Reference Manual, order no. 20665A, April 1996, section 1.9.4, page 49. Archived on 5 Sep 2024.
Intel vs AMD, Web site: Case No.C-93-20301 PVT, Findings of fact and conclusions of law following "ICE" module of trial. Oct 7, 1994. https://web.archive.org/web/20210510143606/https://ir.amd.com/sec-filings/content/0000898430-94-000804/EX-99_1.txt. 10 May 2021. live.
John H. Wharton, The Complete X86, Volume 1, 1994. MicroDesign Resources, .
Covers instruction set additions of Am486SXLV on page 210, Cyrix 486S on page 273 and IBM 386SLC on page 298.
Potemkin's Hackers Group, OPCODE.LST v4.51, 15 Oct 1999. Archived on 21 May 2001.
Hans Peter Messmer, "The Indispensable PC Hardware Book" (ISBN 0201403994), chapter 10.6.1, pages 280-281
Frank van Gilluwe, "The Undocumented PC, second edition", 1997,, page 120
Microprocessor Report, UMC Announces Enhanced 486SX-Compatible, (vol 8, no.7, May 30, 1994) — describes the UMC U5S as having "built-in SMM, which is hardware- and software-compatible with AMD’s implementation." Archived on 7 Sep 2024.
AMD, 3DNow! Technology Manual, pub.no. 21928G/0, March 2000. Archived on 9 Oct 2018.
Web site: Windows 10 64-bit requirements: Does my CPU support CMPXCHG16b, PrefetchW and LAHF/SAHF?.