Workload Manager Explained

In IBM mainframes, Workload Manager (WLM) is a base component of MVS/ESA mainframe operating system, and its successors up to and including z/OS. It controls the access to system resources for the work executing on z/OS based on administrator-defined goals. Workload Manager components also exist for other operating systems. For example, an IBM Workload Manager is also a software product for AIX operating system.

Workload Manager

On a mainframe computer many different applications execute at the same time. The expectations for executing work are consistent execution times and predictable access to databases. On z/OS the Workload Manager (WLM) component fulfills these needs by controlling work's access to system resources based on external specifications by the system administrator.

The system administrator classifies work to service classes. The classification mechanism uses work attributes like transaction names, user identifications or program names which specific applications are known to use. In addition the system administrator defines goals and importance levels for the service classes representing the application work. The goals define performance expectations for the work. Goals can be expressed as response times, a relative speed (termed velocity) or as discretionary if no specific requirement exists. The response time describes the duration for the work requests after they entered the system and until the application signals to WLM that the execution is completed. WLM is now interested to assure that the average response time of a set of work requests ends in the expected time or that a percentage of work requests fulfill the expectations of the end user.

The definition of a response time also requires that the applications communicate with WLM. If this is not possible a relative speed measure – named execution velocity - is used to describe the end user expectation to the system.

This measurement is based on system states which are continuously collected. The system states describe when a work request uses a system resource and when it must wait for it because it is used by other work. The latter is named a delay state. The quotient of all using states to all productive states (using and delay states) multiplied by 100 is the execution velocity. This measurement does not require any communication of the application with the WLM component but it is also more abstract than a response time goal.

Finally the system administrator assigns an importance to each service class to tell WLM which service classes should get preferred access to system resources if the system load is too high to allow all work to execute. The service classes and goal definitions are organized in service policies together with other constructs for reporting and further controlling and saved as a service definition for access to WLM. The active service definition is saved on a couple data set which allows all z/OS systems of a Parallel Sysplex cluster to access and execute towards the same performance goals.

WLM is a closed control mechanism which continuously collects data about the work and system resources; compares the collected and aggregated measurements with the user definitions from the service definition and adjusts the access of the work to the system resources if the user expectations have not been achieved. This mechanism runs continuously in pre-defined time intervals. In order to compare the collected data with the goal definitions a performance index is calculated.

The performance index for a service class is a single number which tells whether the goal definition could be met, has been overachieved or was missed. WLM modifies the access of the service classes based on the achieved performance index and importance. For this it uses the collected data to project the possibility and result of a change. The change is executed if the forecast comes to the result that it is beneficial for the work based on the defined customer expectations. WLM uses a data base ranging from 20 seconds to 20 minutes to contain a statistically relevant basis of samples for its calculations. Also in one decision interval a change is performed for the benefit of one service class to maintain a controlled and predictable system.

WLM controls the access of the work to the system processors, the I/O units, the system storage and starts and stops processes for work execution. The access to the system processors for example is controlled by a dispatch priority which defines a relative ranking between the units of work which want to execute. The same dispatch priority is assigned to all units of work which were classified to the same service class. As already stated the dispatch priority is not fixed and not simply derived from the importance of the service class. It changes based on goal achievement, system utilization and demand of the work for the system processors. Similar mechanisms exist for controlling all other system resources. This way of z/OS Workload Manager controlling the access of work to system resources is named goal oriented workload management and is in contrast to resource entitlement based workload management which defines a much more static relationship how work can access the system resources. Resource entitlement based workload management is found on larger UNIX operating systems for example.

A major difference to workload management components on other operating systems is the close cooperation between z/OS Workload Manager and the major applications; middleware and subsystems executing on z/OS. WLM offers interfaces which allow the subsystems to tell WLM when a unit of work starts and ends in the system and to pass classification attributes which can be used by the system administrator to classify the work on the system. In addition WLM offers interfaces which allow load balancing components to place work requests on the best suited system in a parallel sysplex cluster. Additional instrumentation exists which helps database and resource managers to signal contention situations to WLM so that WLM can help the delayed work by promoting the holder of resource locks and latches.

Over time z/OS Workload Manager became the central control component for all performance related aspects in a z/OS operating system. In a Parallel Sysplex cluster the z/OS Workload Manager components work together to provide a single image view for the executing applications on the cluster. On a System z with multiple virtual partitions z/OS WLM allows to interoperate with the LPAR Hypervisor to influence the weighting of the z/OS partitions and to control the amount of CPU capacity which can be consumed by the logical partitions.

Literature

External links

See also