Comparison of CPU microarchitectures explained

The following is a comparison of CPU microarchitectures.

MicroarchitectureYearPipeline stagesMisc
Elbrus-8S2014VLIW, Elbrus (proprietary, closed) version 5, 64-bit
AMD K519965Superscalar, branch prediction, speculative execution, out-of-order execution, register renaming
AMD K619976Superscalar, branch prediction, speculative execution, out-of-order execution, register renaming
AMD K6-III1999Branch prediction, speculative execution, out-of-order execution[1]
AMD K71999Out-of-order execution, branch prediction, Harvard architecture
AMD K8200364-bit, integrated memory controller, 16 byte instruction prefetching
AMD K102007Superscalar, out-of-order execution, 32-way set associative L3 victim cache, 32-byte instruction prefetching
ARM7TDMI (-S)20013
ARM7EJ-S20015
ARM8105static branch prediction, double-bandwidth memory
ARM9TDMI19985
ARM1020E6
XScale PXA210/PXA25020027
ARM1136J(F)-S8
ARM1156T2(F)-S9
ARM Cortex-A58Multi-core, single issue, in-order
ARM Cortex-A7 MPCore8Partial dual-issue, in-order, 2-way set associative level 1 instruction cache
ARM Cortex-A8200513Dual-issue, in-order, speculative execution, superscalar, 2-way pipeline decode
ARM Cortex-A9 MPCore20078–11Out-of-order, speculative issue, superscalar
ARM Cortex-A15 MPCore201015Multi-core (up to 16), out-of-order, speculative issue, 3-way superscalar
ARM Cortex-A532012Partial dual-issue, in-order
ARM Cortex-A5520178in-order, speculative execution
ARM Cortex-A572012Deeply out-of-order, wide multi-issue, 3-way superscalar
ARM Cortex-A722015
ARM Cortex-A732016Out-of-order superscalar
ARM Cortex-A75201711–13Out-of-order superscalar, speculative execution, register renaming, 3-way
ARM Cortex-A76201813Out-of-order superscalar, 4-way pipeline decode
ARM Cortex-A77201913Out-of-order superscalar, speculative execution, register renaming, 6-way pipeline decode, 10-issue, branch prediction, L3 cache
ARM Cortex-A78202013Out-of-order superscalar, register renaming, 4-way pipeline decode, 6 instruction per cycle, branch prediction, L3 cache
ARM Cortex-A710202110
ARM Cortex-X12020135-wide decode out-of-order superscalar, L3 cache
ARM Cortex-X2202110
ARM Cortex-X320229
ARM Cortex-X4202310
AVR32 AP77
AVR32 UC33Harvard architecture
Bobcat2011Out-of-order execution
Bulldozer201120Shared multithreaded L2 cache, multithreading, multi-core, around 20 stage long pipeline, integrated memory controller, out-of-order, superscalar, up to 16 cores per chip, up to 16 MB L3 cache, Virtualization, Turbo Core, FlexFPU which uses simultaneous multithreading[2]
Piledriver2012Shared multithreaded L2 cache, multithreading, multi-core, around 20 stage long pipeline, integrated memory controller, out-of-order, superscalar, up to 16 MB L2 cache, up to 16 MB L3 cache, Virtualization, FlexFPU which use simultaneous multithreading, up to 16 cores per chip, up to 5 GHz clock speed, up to 220 W TDP, Turbo Core
Steamroller2014Multi-core, branch prediction
Excavator201520Multi-core
Zen201719Multi-core, superscalar, 2-way simultaneous multithreading, 4-way decode, out-of-order execution, L3 cache
Zen+201819Multi-core, superscalar, 4-way decode, out-of-order execution, L3 cache
Zen 2201919Multi-chip module, multi-core, superscalar, 4-way decode, out-of-order execution, L3 cache
Zen 3202019Multi-chip module, multi-core, superscalar, 4-way decode, out-of-order execution, SMT, L3 cache
Zen 42022Multi-chip module, multi-core, superscalar, L3 cache
Crusoe2000In-order execution, 128-bit VLIW, integrated memory controller
Efficeon2004In-order execution, 256-bit VLIW, fully integrated memory controller
Cyrix Cx5x8619956[3] Branch prediction
Cyrix 6x861996Superscalar, superpipelined, register renaming, speculative execution, out-of-order execution
DLX5
eSi-32005In-order, speculative issue
eSi-32505In-order, speculative issue
EV4 (Alpha 21064)Superscalar
EV7 (Alpha 21364)Superscalar design with out-of-order execution, branch prediction, 4-way simultaneous multithreading, integrated memory controller
EV8 (Alpha 21464)Superscalar design with out-of-order execution
65kUltra low power consumption, register renaming, out-of-order execution, branch prediction, multi-core, module, capable of reach higher clock
P5 (Pentium)19935Superscalar
P6 (Pentium Pro)14Speculative execution, register renaming, superscalar design with out-of-order execution
P6 (Pentium II)14[4] Branch prediction
P6 (Pentium III)199514
Intel Itanium "Merced"2001Single core, L3 cache
Intel Itanium 2 "McKinley"200211[5] Speculative execution, branch prediction, register renaming, 30 execution units, multithreading, multi-core, coarse-grained multithreading, 2-way simultaneous multithreading, Dual-domain multithreading, Turbo Boost, Virtualization, VLIW, RAS with Advanced Machine Check Architecture, Instruction Replay technology, Cache Safe technology, Enhanced SpeedStep technology
Intel NetBurst (Willamette)2000202-way simultaneous multithreading (Hyper-threading), Rapid Execution Engine, Execution Trace Cache, quad-pumped Front-Side Bus, Hyper-pipelined Technology, superscalar, out-of order
NetBurst (Northwood)2002202-way simultaneous multithreading
NetBurst (Prescott)2004312-way simultaneous multithreading
NetBurst (Cedar Mill)2006312-way simultaneous multithreading
Intel Core200612Multi-core, out-of-order, 4-way superscalar
Intel Atom162-way simultaneous multithreading, in-order, no instruction reordering, speculative execution, or register renaming
Intel Atom Oak Trail2-way simultaneous multithreading, in-order, burst mode, 512 KB L2 cache
Intel Atom Bonnell2008SMT
Intel Atom Silvermont2013Out-of-order execution
Intel Atom Goldmont2016Multi-core, out-of-order execution, 3-wide superscalar pipeline, L2 cache
Intel Atom Goldmont Plus2017Multi-core
Intel Atom Tremont2019Multi-core, superscalar, out-of-order execution, speculative execution, register renaming
Intel Atom Gracemont2021Multi-core, superscalar, out-of-order execution, speculative execution, register renaming
Intel Atom Crestmont2023Multi-core
Intel Atom Skymont2024Multi-core
Nehalem2008142-way simultaneous multithreading, out-of-order, 6-way superscalar, integrated memory controller, L1/L2/L3 cache, Turbo Boost
Sandy Bridge2011142-way simultaneous multithreading, multi-core, on-die graphics and PCIe controller, system agent with integrated memory and display controller, ring interconnect, L1/L2/L3 cache, micro-op cache, 2 threads per core, Turbo Boost,
Intel Haswell201314–19SoC design, multi-core, multithreading, 2-way simultaneous multithreading, hardware-based transactional memory (in selected models), L4 cache (in GT3 models), Turbo Boost, out-of-order execution, superscalar, up to 8 MB L3 cache (mainstream), up to 20 MB L3 cache (Extreme)
Broadwell201414–19Multi-core, multithreading
Skylake201514–19Multi-core, L4 cache on certain Skylake-R, Skylake-U and Skylake-Y models. On-package PCH on U, Y, m3, m5 and m7 models. 5 wide superscalar/5 issues.
Kaby Lake201614–19Multi-core, L4 cache on certain low and ultra low power models (Kaby Lake-U and Kaby Lake-Y),
Intel Sunny Cove201914–20Multicore, 2-way multithreading, massive OoOE engine, 5 wide superscalar/5 issue.
Intel Cypress Cove202114multicore, 5 wide superscalar/6 issues, massive OoOE engine, big core design.
Intel Willow Cove2020Multicore, SMT
Intel Golden Cove2021Multicore, SMT
Intel Redwood Cove2023Multicore, SMT
Intel Lion Cove2024Multicore, without SMT
Intel Xeon Phi 7120x20137-stage integer, 6-stage vectorMulti-core, multithreading, 4 hardware-based simultaneous threads per core which can't be disabled unlike regular HyperThreading, Time-multiplexed multithreading, 61 cores per chip, 244 threads per chip, 30.5 MB L2 cache, 300 W TDP, Turbo Boost, in-order dual-issue pipelines, coprocessor, Floating-point accelerator, 512-bit wide Vector-FPU
LatticeMico3220066Harvard architecture
Nvidia Denver2014Multicore, superscalar, 2-way decode, L2
Nvidia Carmel2018Multicore, 10-way superscalar, L3
POWER11990Superscalar, out-of-order execution
POWER31998Superscalar, out-of-order execution
POWER42001Superscalar, speculative execution, out-of-order execution
POWER520042-way simultaneous multithreading, out-of-order execution, integrated memory controller
IBM POWER620072-way simultaneous multithreading, in-order execution, up to 5 GHz
IBM POWER7+Multi-core, multithreading, out-of-order, superscalar, 4 intelligent simultaneous threads per core, 12 execution units per core, 8 cores per chip, 80 MB L3 cache, true hardware entropy generator, hardware-assisted cryptographic acceleration, fixed-point unit, decimal fixed-point unit, Turbo Core, decimal floating-point unit
IBM POWER8201315–23Superscalar, L4 cache
IBM POWER9201712–16Superscalar, out-of-order execution, L4 cache
IBM Power102021Superscalar
IBM Cell2006Multi-core, multithreading, 2-way simultaneous multithreading (PPE), Power Processor Element, Synergistic Processing Elements, Element Interconnect Bus, in-order execution
IBM Cyclops64Multi-core, multithreading, 2 threads per core, in-order
IBM zEnterprise zEC12201215/16/17Multi-core, 6 cores per chip, up to 5.5 GHz, superscalar, out-of-order, 48 MB L3 cache, 384 MB shared L4 cache
IBM A215multicore, 4-way simultaneous multithreaded
PowerPC 40119963
PowerPC 40519985
PowerPC 44019997
PowerPC 47020099Symmetric multiprocessing (SMP)
PowerPC e3004Superscalar, branch prediction
PowerPC e500Dual 7 stageMulti-core
PowerPC e6003-issue 7 stageSuperscalar out-of-order execution, branch prediction
PowerPC e550020104-issue 7 stageOut-of-order, multi-core
PowerPC e65002012Multi-core
PowerPC 60345 execution units, branch prediction, no SMP
PowerPC 603q19965In-order
PowerPC 60419946Superscalar, out-of-order execution, 6 execution units, SMP support
PowerPC 62019975Out-of-order execution, SMP support
PWRficient PA6T2007Superscalar, out-of-order execution, 6 execution units
R400019918Scalar
StrongARM SA-11019965Scalar, in-order
SuperH SH25
SuperH SH2A20065Superscalar, Harvard architecture
SPARCSuperscalar
hyperSPARC1993Superscalar
SuperSPARC1992Superscalar, in-order
SPARC64 VI/VII/VII+2007Superscalar, out-of-order[6]
UltraSPARC19959
UltraSPARC T120056Open source, multithreading, multi-core, 4 threads per core, scalar, in-order, integrated memory controller, 1 FPU
UltraSPARC T220078Open source, multithreading, multi-core, 8 threads per core
SPARC T320108Multithreading, multi-core, 8 threads per core, SMP, 16 cores per chip, 2 MB L3 cache, in-order, hardware random number generator
Oracle SPARC T4201116Multithreading, multi-core, 8 fine-grained threads per core of which 2 can be executed simultaneously, 2-way simultaneous multithreading, SMP, 8 cores per chip, out-of-order, 4 MB L3 cache, out-of order, Hardware random number generator
Oracle Corporation SPARC T5201316Multithreading, multi-core, 8 fine-grained threads per core of which 2 can be executed simultaneously, 2-way simultaneous multithreading, 16 cores per chip, out-of-order, 16-way associative shared 8 MB L3 cache, hardware-assisted cryptographic acceleration, stream-processing unit, out-of order execution, RAS features, 16 cryptography units per chip, hardware random number generator
Oracle SPARC M516Multithreading, multi-core, 8 fine-grained threads per core of which 2 can be executed simultaneously, 2-way simultaneous multithreading, 6 cores per chip, out-of-order, 48 MB L3 cache, out-of order execution, RAS features, stream-processing unit, hardware-assisted cryptographic acceleration, 6 cryptography units per chip, Hardware random number generator
Fujitsu SPARC64 XMultithreading, multi-core, 2-way simultaneous multithreading, 16 cores per chip, out-of order, 24 MB L2 cache, out-of order, RAS features
Imagination Technologies MIPS Warrior
VIA C72005In-order execution
VIA Nano (Isaiah)2008Superscalar out-of-order execution, branch prediction, 7 execution units
WinChip19974In-order execution

See also

Notes and References

  1. Web site: Products We Design. amd.com. 19 January 2014.
  2. Web site: wp-content/uploads/2013/07/AMD-Steamroller-vs-Bulldozer. cdn3.wccftech.com. 19 January 2014. 17 October 2013. https://web.archive.org/web/20131017014731/http://cdn3.wccftech.com/wp-content/uploads/2013/07/AMD-Steamroller-vs-Bulldozer.jpg. dead.
  3. Web site: Cyrix 5x86 ("M1sc"). pcguide.com. 19 January 2014.
  4. Web site: Computer Science 246: Computer Architecture . P6 pipeline . Harvard University . 23 December 2013 . https://web.archive.org/web/20131224113122/http://www.eecs.harvard.edu/cs246/lectures/cs246-MOBROBP6R10K.pdf . 24 December 2013 . dead .
  5. Intel Itanium 2 Processor Hardware Developer's Manual. p. 14. http://www.intel.com/design/itanium2/manuals/25110901.pdf (2002) Retrieved 28 November 2011
  6. Web site: Multi Core Processor SPARC64 Series : Fujitsu Global. fujitsu.com. 19 January 2014.