PSE-36 explained

In computing, PSE-36 (36-bit Page Size Extension)[1] refers to a feature of x86 processors that extends the physical memory addressing capabilities from 32 bits to 36 bits, allowing addressing to up to 64 GB of memory.[2] Compared to the Physical Address Extension (PAE) method, PSE-36 is a simpler alternative to addressing more than 4 GB of memory. It uses the Page Size Extension (PSE) mode and a modified page directory table to map 4 MB pages into a 64 GB physical address space. PSE-36's downside is that, unlike PAE, it doesn't have 4-KB page granularity above the 4 GB mark.

PSE-36 was introduced into the x86 architecture with the Pentium II Xeon and was initially advertised as part of the "Intel Extended Server Memory Architecture"[3] (sometimes abbreviated ESMA[4]), a branding which also included the slightly older PAE (and thus the Pentium Pro, which only supported PAE, was advertised as having only "subset support" for ESMA).[1]

The heyday of PSE-36 was relatively brief. PSE-36's main advantage was that, unlike PAE, it required little rework of the operating system's internals, and thus PSE-36 proved a suitable stopgap measure around the Windows NT 4.0 Enterprise Edition timeframe. Newer Microsoft operating systems, including Windows 2000, support only PAE. Some operating systems like Linux skipped PSE-36 entirely.[5] Despite this, AMD and later Intel chose to provide up to 40 bits PSE support in their 64-bit processors, when operated in legacy mode.

Operation

Detection

Support for PSE-36 is indicated by EDX bit 17 (counting from 0) in the cpuid result for feature bits. (This is a different bit from plain PSE support, which is indicated by bit 3 in the same register).[6] [7]

Activation and use

As far as activating PSE-36, there isn't however a separate bit from the one that turns on PSE.[7] As long the processor (as indicated by cpuid) and chipset support PSE-36, enabling PSE alone (by setting bit 4, PSE, of the system register [[Control register#CR4|CR4]]) allows the use of large 4 MB pages (in the 64 GB range) along with normal 4 KB pages (which are however restricted to the 4 GB range).[7]

If newer PSE-36 capability is available on the CPU, as checked using the CPUID instruction, then 4 more bits, in addition to the 10 bits used in PSE, are used inside a page directory entry pointing to a large page. This allows a large page to be located in 36 bit address space.[7]

The PS bit (bit 7) in the Page Directory Entry (PDE) denotes whether this entry refers to a page table (that describes 1024 4-KiB pages) or one 4 MB page. PDE structures in normal mode, PSE mode, and PSE-36 mode are as follows:

Page Directory Entry for 32 bit paging
31 - 2221 - 1716 - 131211 - 9876543210
non-PSEbase address of page tableavail0PS=0ignAPCDPWTUWP
PSEbit 31..22 of page frame addressreserved (must be zero)PATavail0PS=1DAPCDPWTUWP
PSE-36bit 31..22 of page frame addressreserved (must be zero)bit 35..32 of page frame addressPATavail0PS=1DAPCDPWTUWP
  1. Page attribute table; since Pentium III, must be zero for older CPUs.
  2. "Dirty" bit: set to 1 by CPU if there was a write access to that page. For 4 KiB pages this flag exists in the according page table entry (PTE).

Extension up to 40 bits

AMD extends this scheme to 40 address bits by interpreting bits 20..13 of a PDE as bit 39..32 of the page base address in their AMD64 processors when operated in legacy mode, so only bit 21 is reserved (must be zero). Note however that CR4.PSE is ignored in long mode and PSE-style 4 MB pages are not available in that mode.[8] The total amount of physical memory addressable in AMD64 legacy mode using PSE 4-MB pages is, thus, 1024 GB.[9] Tom Shanley has called this extension PSE-40,[9] although such a designation does not appear in the official AMD documentation.[8]

The latest Intel manuals (February 2014) also indicate support for up to 40 bits in PSE. The exact number of PSE bits supported on Intel CPUs can be less though, and must be determined by using CPUID to query the maximum physical-address width supported by the processor by invoking CPUID with function 80000008H and checking the result in EAX[7:0].[10]

Usage

Practical usefulness of the PSE-36 feature depends on chipset support for more than 4 GB of RAM. Most chipsets from the Pentium II timeframe did not support this much memory, with 1 GB being the maximum for the Intel 440BX typical desktop chipset, and 2 GB for the 440GX workstation chipset. Only the high-end server Intel 450NX chipset supported 8 GB.[2] [11] Support for PSE-36 (ESMA) was thus usually advertised for servers.[3]

As suitable operating system supporting PSE-36, in 1998 Intel advertised Microsoft Windows NT Server, Enterprise Edition 4.0 and supposedly the upcoming NT 5.0, both enabling use via a PSE36 device driver, which kept most of the operating system unaware of PSE-36 (only the PSE36 driver enabled it temporarily), and which driver had to be called by applications that wanted to access more than 4 GB.[9] Windows NT 4.0 Enterprise Edition thus used the PSE-36 feature essentially as a RAM disk. The PSE36 driver was used by some applications on Windows NT 4.0 Enterprise Edition servers, for example SAP liveCache,[12] Microsoft SQL Server 7.0, Oracle 8.1.5,[13] and IBM DB2. The tuning documentation for the latter noted however that "Unfortunately in most cases performance gains obtained using the PSE-36 driver are not spectacular. In many cases the server will run slower with 8 GB using the PSE-36 driver than it runs with 4 GB without the driver. [...] After more than a year of experimentation and tuning, Microsoft and IBM dropped support for PSE-36 due to insufficient performance gains. The driver is still available for vendors from Intel, but it is not useful for end customer use."[14]

Windows 2000 (NT 5.0) ended up not supporting PSE-36,[15] due to low performance when compared with the alternative PAE.[16] Windows 2000 also replaced the API of the PSE36 driver with a new API called Address Windowing Extensions (AWE), which used PAE underneath.[15] [13] (AWE was only available in the Datacenter Server and Advanced Server of Windows 2000.) Windows applications consequently migrated to this new API, e.g. starting with Oracle 8.1.6[13] or MS SQL Server 2000.[15]

PSE-36 was never used by Linux.

Compared to PAE

Physical Address Extension (PAE) is an alternative to PSE-36 which also allows 36-bit addressing. PSE-36 has the advantages that the hierarchy of page tables is not changed, and that page entries keep their old 32-bit format and are not extended to 64 bits. The obvious disadvantage of PSE-36 is that only large pages can be located in 64 GB of physical memory, and small pages can still be located only in the first 4 GB of physical memory.[16]

Intel Extended Server Memory Architecture

The Intel Extended Server Memory Architecture is defined to include two 36-bit addressing modes in the core processor: PAE-36 and PSE-36.

See also

Notes and References

  1. Web site: The Intel Extended Server Memory Architecture. 1998. Intel Order Number: 243846-001. 2014-03-01.
  2. Web site: [ftp://www.redbooks.ibm.com/redbooks/SG245287/sg245287.pdf Netinfinity Performance Tuning with Windows NT 4.0 ]. Redbooks.ibm.com . 2014-03-01. 51–52.
  3. Deni Connor. Here come the eight-way Xeon servers. Network World: The Leader in Network Knowledge. 7 December 1998. Network World. 19. 0887-7661.
  4. Book: Michael Missbach. Uwe M. Hoffmann. SAP Hardware Solutions. 2000. Prentice Hall Professional. 978-0-13-028084-8. 62.
  5. Book: Daniel P. Bovet. Marco Cesati. Understanding the Linux Kernel. 17 November 2005. "O'Reilly Media, Inc.". 978-0-596-55491-0. 52.
  6. ftp://download.intel.com/support/processors/procid/24161812.pdf Intel Processor Identification and the CPUID Instruction
  7. Book: Tom Shanley. The Unabridged Pentium 4: IA32 Processor Genealogy. 2005. Addison Wesley Professional. 978-0-321-24656-1. 732–736.
  8. Web site: Volume 2: System Programming . AMD Corporation . September 2012 . AMD64 Architecture Programmer's Manual . AMD Corporation . 2014-02-17 . 25–26 and 125–126 . 3.22 .
  9. Book: Tom Shanley. x86 Instruction Set Architecture. 578–579. 9780977087853. MindShare Press. 2009.
  10. Web site: Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3A: System Programming Guide, Part 1 . . "4-5" and "4-11". "If the PSE-36 mechanism is not supported, M is 32, and this row does not apply. If the PSE-36 mechanism is supported, M is the minimum of 40 and MAXPHYADDR (this row does not apply if MAXPHYADDR = 32). See Section 4.1.4 for how to determine MAXPHYADDR and whether the PSE-36 mechanism is supported. [...] CPUID.80000008H:EAX[7:0] reports the physical-address width supported by the processor. (For processors that do not support CPUID function 80000008H, the width is generally 36 if CPUID.01H:EDX.PAE [bit 6] = 1 and 32 otherwise.) This width is referred to as MAXPHYADDR. MAXPHYADDR is at most 52.".
  11. http://www.tomshardware.com/reviews/intel,69-3.html Intel's Pentium II Xeon Processor. The New Chipsets For The Pentium II Xeon
  12. Web site: How does the liveCache < 7.4 use PSE36/AWE . Stechno.net . 2003-04-04 . 2014-03-01.
  13. Web site: Michael R. Ault. Increasing Available Memory in Linux and Windows. ROBO Books White Paper . 2003-02-17 . 2014-03-01. 10–12.
  14. Book: Tuning IBM xSeries Servers for Performance. IBM SG24-5287-02. 3rd. June 2002. 97. dead. https://web.archive.org/web/20140303011729/http://www-03.ibm.com/systems/kr/resources/systems_kr_x_techsupport_Tuning_xSeries_for_Performance.pdf. 2014-03-03.
  15. Book: Sajal Dam. SQL Server Query Performance Tuning Distilled. 2004. Apress. 978-1-4302-0407-7. 28.
  16. Web site: Operating Systems and PAE Support . Msdn.microsoft.com . 2006-07-14 . 2014-03-01.