NLTSS explained

Network Livermore Timesharing System (NLTSS)
Developer:	Lawrence Livermore Laboratory
Family:	capability-based
Working State:	Discontinued
Source Model:	Closed source
Discontinued:	Yes
Latest Release Version:	Final
Df:	yes -->
Marketing Target:	Supercomputers
Programmed In:	Model (Pascal extension)
Language:	English
Update Model:	Compile from source code
Supported Platforms:	CDC 7600, Cray-1, Cray X-MP, Cray Y-MP
Kernel Type:	Microkernel
License:	Proprietary

The Network Livermore Timesharing System (NLTSS, also sometimes the New Livermore Time Sharing System) is an operating system that was actively developed at Lawrence Livermore Laboratory (now Lawrence Livermore National Laboratory) from 1979 until about 1988, though it continued to run production applications until 1995. An earlier system, the Livermore Time Sharing System had been developed over a decade earlier.

NLTSS ran initially on a CDC 7600 computer, but only ran production from about 1985 until 1994 on Cray computers including the Cray-1, Cray X-MP, and Cray Y-MP models.

Characteristics

The NLTSS operating system was unusual in many respects and unique in some.

Low-level architecture

NLTSS was a microkernel message passing system. It was unique in that only one system call was supported by the kernel of the system. That system call, which might be called "communicate" (it didn't have a name because it didn't need to be distinguished from other system calls) accepted a list of "buffer tables" (e.g., see The NLTSS Message System Interface)^[1] that contained control information for message communication – either sends or receives. Such communication, both locally within the system and across a network was all the kernel of the system supported directly for user processes. The "message system" (supporting the one call and the network protocols) and drivers for the disks and processor composed the entire kernel of the system.

Mid-level architecture

NLTSS is a capability-based security client–server system. The two primary servers are the file server and the process server. The file server was a process privileged to be trusted by the drivers for local storage (disk storage,) and the process server was a process privileged to be trusted by the processor driver (software that switched time sharing control between processes in the "alternator", handled interrupts for processes besides the "communicate" call, provided access to memory and process state for the process server, etc.).

NLTSS was a true network operating system in that its resource requests could come from local processes or remote processes anywhere on the network and the servers didn't distinguish them. A server's only means to make such distinctions would be by network address and they had no reason to make such distinctions. All requests to the servers appeared as network requests.

Communication between processes in NLTSS by convention used the Livermore Interactive Network Communication System (LINCS) protocol suite, which defined a protocol stack along the lines of that defined by the OSI reference model. The transport level protocol for NLTSS and LINCS was named Delta-T. At the presentation level, LINCS defined standards for communicating numbered parameters as tokens (e.g., integers, capabilities, etc.) that were stored in a session level record for processing in a remote procedure call sort of mechanism.

The notion of a "user" was only rather peripherally defined in NLTSS. There was an "account server" that kept track of which users were using which resources (e.g., requests to create objects such as file or processes required such an account capability). Access control was entirely managed with capabilities (communicable authority tokens).

File server

Any process could make requests to the file server for the creation of files (returning a file capability), ask to read or write files (by presenting a file capability), etc. For example, the act of reading a file generally required three buffer tables, one to send the request to the file server, one to receive the reply from the file server, and one to receive the data from the file. These three requests were generally submitted at one time to the message system, sometimes bundled with other requests. Control bits could be set in the buffer tables to awaken (unblock) a process whenever any of the buffer tables submitted were marked "Done". A library call to read a file would typically block until the control reply was received from the file server, though asynchronous I/O would of course not block and could check or block later. Any such differences on the user side were invisible to the file server.

Process server

In NLTSS the process server was quite similar to the file server in that user processes could ask for the creation of processes, the starting or stopping of processes, reading or writing process memory or registers, and to be notified of process faults. The process server was an ordinary user mode process that was simply trusted to communicate with the CPU driver, just like the file server was trusted to communicate with the disk driver. The process server stored process state in files provided by the file server and in that regard appeared like any other user process to the file server.

Directory server

An example higher level server in NLTSS was the directory server. This server's task was to essentially turn files (invisible to the user) into directories that could be used to store and retrieve capabilities by name. Since capabilities were simply data this wasn't a particularly difficult task, consisting mostly of manipulating access permissions on the capabilities according to the conventions defined in the LINCS protocol suite. One place where this got a bit interesting was regarding an access permission named inheritance. If this bit was on (allowed), then capabilities could be fetched with their full access from the directory. If this bit was turned off (disallowed), then any permissions turned off in the directory capability were in turn turned off in the capability being fetched before it was returned to the requesting application. This mechanism allowed people to store, for example, read/write files in a directory, but to give other users only permission to fetch read-only instances of them.

Development

The bulk of the programming for NLTSS was done in a Pascal extension developed at Los Alamos National Laboratory known as "Model". Model extended Pascal to include an abstract data type (object) mechanism and some other features.

NLTSS was saddled with a compatibility legacy. NLTSS followed the development and deployment of the Livermore Time Sharing System (LTSS) in the Livermore Computer Center at LLNL (~1968–1988?). NLTSS development began about the same time LTSS was ported to the Cray-1 to become the Cray Time Sharing System. To stay backward compatible with the many scientific applications at LLNL, NLTSS was forced to emulate the prior LTSS operating system's system calls. This emulation was implemented in the form of a compatibility library named "baselib". As one example, while the directory structure and thus the process structure for NLTSS was naturally a directed graph (process capabilities could be stored in directories just like file capabilities or directory capabilities), the baselib library emulated a simple linear (controller – controllee) process structure (not even a tree structure as in Unix) to stay compatible with the prior LTSS. Since scientific users never accessed NLTSS services outside the baselib library, NLTSS ended up looking nearly exactly like LTSS to its users. Most users weren't aware of capabilities, didn't realize that they could access resources across the network, and generally weren't aware that NLTSS offered any services beyond those of LTSS. NLTSS did support shared memory symmetric multiprocessing, a development that paralleled a similar development in the Cray Time Sharing System.

Even the name NLTSS was something of a legacy. The "New Livermore Time Sharing System" name was initially considered a temporary name to use during development. Once the system began to run some applications in a dual system mode (sort of a virtual machine sharing drivers with LTSS) a more permanent name, LIncs Network Operating System (LINOS), was chosen by the developers. Unfortunately, the management at LLNL decided that the name couldn't be changed at that point (seemingly because the prior term had been used in budget requests) so the temporary development NLTSS name stayed with the system throughout its lifetime.

A mass storage system was also developed in parallel with NLTSS that used the LINCS protocols (same file and directory protocols as NLTSS). This system/software was later commercialized as the Unitree product. Unitree was generally superseded by the High Performance Storage System (HPSS) that could loosely be considered a legacy of LINCS and NLTSS. For example, LINCS and NLTSS introduced a form of third party transfer (to copy file to file in NLTSS a process could send two requests to file servers, one to read and one to write and direct the file servers to transfer the data between themselves) that carried through in modified form to Unitree and HPSS.

Implementation and design issues

The biggest knock against NLTSS during its production lifetime was performance. The one performance issue that affected users most was file access latency. This generally wasn't a significant problem for disk input/output (I/O), but the systems that NLTSS ran on also supported a significant complement of very low latency solid state disks with access times under 10 microseconds. The initial latencies for file operations under NLTSS were comparable to the latency for solid state disk access and significantly higher than the LTSS latency for such access. To improve file access latency under NLTSS the implementation was changed significantly to put the most latency sensitive processes (in particular the file server) "in the kernel". This effort wasn't as significant as it might at first sound as all NLTSS servers worked on a multithreading model. What this change really amounted to was to move the threads responsible for file server services from a separate file server process into the kernel "process". Communication to users was unchanged (still through buffer tables, LINCS tokens, etc.), but file operations avoided some significant context changes that were the primary cause of the higher latencies over what the older LTSS and the competing Cray Time Sharing System provided. This change did significantly (~3x) improve the latency of file I/O operations, but it also meant that the file server became a trusted part of the kernel (by implementation, not by design).

A second implementation issue with NLTSS related to the security/integrity of its capability as data implementation. This implementation used a password capability model (e.g., see Control by Password).^[2] With this model any person or process that could get access to the memory space of a process would have the authority to access the capability represented by the data found in that memory. Some system architects (e.g., Andrew S. Tanenbaum, the architect of the Amoeba distributed operating system) have suggested that this property of access to memory implying access to capabilities is not an inherent problem. In the environment of NLTSS, it sometimes happened that people took program memory dumps to others for analysis. Because of this and other concerns, such password capabilities were considered a vulnerability in NLTSS. A design was done to protect against this vulnerability, the Control by Public Key Encryption^[3] mechanism. This mechanism wasn't put into production in NLTSS both because of its significant performance cost and because users weren't aware of the vulnerability from password capabilities. Modern advances in cryptography would make such protection for capabilities practical, especially for Internet/Web capabilities (e.g., see YURLs^[4] or WideWORD).^[5]

A design issue with NLTSS that wasn't considered until years after it was removed from production was its open network architecture. In NLTSS processes were considered as virtual processors in a network with no firewalls or other restrictions. Any process could communicate freely to any other. This meant that it was not possible to do confinement even in the sense of limiting direct communication, e.g., vs. limiting covert channels such as "wall banging". To correct this problem NLTSS would have to require capabilities to enable communication. Late development work on NLTSS such as "stream numbers" was getting close to such a facility, but by the time active development stopped in 1988, communication in NLTSS was still unconfined.

External links

Capability Computing at LLNL A discussion of the history of the use of capabilities in computing at LLNL, including a brief mention of the RATS system and how the development lead to NLTSS
Stories of the Development of Large Scale Scientific Computing at Lawrence Livermore National Laboratory
The NLTSS Chronicles Cartoons from the development of NLTSS and LINCS

Notes and References

Web site: Components of a Network Operating System . webstart.com.
Web site: Managing Domains in a Network Operating System . webstart.com.
Web site: Managing Domains in a Network Operating System . webstart.com.
Web site: YURL . Waterken Inc..
Web site: Home . wideword.net.