OS-level virtualization explained

OS-level virtualization is an operating system (OS) virtualization paradigm in which the kernel allows the existence of multiple isolated user space instances, called containers (LXC, Solaris containers, AIX WPARs, HP-UX SRP Containers, Docker, Podman), zones (Solaris containers), virtual private servers (OpenVZ), partitions, virtual environments (VEs), virtual kernels (DragonFly BSD), or jails (FreeBSD jail or chroot jail).[1] Such instances may look like real computers from the point of view of programs running in them. A computer program running on an ordinary operating system can see all resources (connected devices, files and folders, network shares, CPU power, quantifiable hardware capabilities) of that computer. However, programs running inside of a container can only see the container's contents and devices assigned to the container.

On Unix-like operating systems, this feature can be seen as an advanced implementation of the standard chroot mechanism, which changes the apparent root folder for the current running process and its children. In addition to isolation mechanisms, the kernel often provides resource-management features to limit the impact of one container's activities on other containers. Linux containers are all based on the virtualization, isolation, and resource management mechanisms provided by the Linux kernel, notably Linux namespaces and cgroups.[2]

The term container, while most popularly referring to OS-level virtualization systems, is sometimes ambiguously used to refer to fuller virtual machine environments operating in varying degrees of concert with the host OS, e.g., Microsoft's Hyper-V containers. A more historic overview of virtualization in general since 1960 can be found in the Timeline of virtualization development.

Operation

On ordinary operating systems for personal computers, a computer program can see (even though it might not be able to access) all the system's resources. They include:

The operating system may be able to allow or deny access to such resources based on which program requests them and the user account in the context in which it runs. The operating system may also hide those resources, so that when the computer program enumerates them, they do not appear in the enumeration results. Nevertheless, from a programming point of view, the computer program has interacted with those resources and the operating system has managed an act of interaction.

With operating-system-virtualization, or containerization, it is possible to run programs within containers, to which only parts of these resources are allocated. A program expecting to see the whole computer, once run inside a container, can only see the allocated resources and believes them to be all that is available. Several containers can be created on each operating system, to each of which a subset of the computer's resources is allocated. Each container may contain any number of computer programs. These programs may run concurrently or separately, and may even interact with one another.

Containerization has similarities to application virtualization: In the latter, only one computer program is placed in an isolated container and the isolation applies to file system only.

Uses

Operating-system-level virtualization is commonly used in virtual hosting environments, where it is useful for securely allocating finite hardware resources among a large number of mutually-distrusting users. System administrators may also use it for consolidating server hardware by moving services on separate hosts into containers on the one server.

Other typical scenarios include separating several programs to separate containers for improved security, hardware independence, and added resource management features.[3] The improved security provided by the use of a chroot mechanism, however, is not perfect.[4] Operating-system-level virtualization implementations capable of live migration can also be used for dynamic load balancing of containers between nodes in a cluster.

Overhead

Operating-system-level virtualization usually imposes less overhead than full virtualization because programs in OS-level virtual partitions use the operating system's normal system call interface and do not need to be subjected to emulation or be run in an intermediate virtual machine, as is the case with full virtualization (such as VMware ESXi, QEMU, or Hyper-V) and paravirtualization (such as Xen or User-mode Linux). This form of virtualization also does not require hardware support for efficient performance.

Flexibility

Operating-system-level virtualization is not as flexible as other virtualization approaches since it cannot host a guest operating system different from the host one, or a different guest kernel. For example, with Linux, different distributions are fine, but other operating systems such as Windows cannot be hosted. Operating systems using variable input systematics are subject to limitations within the virtualized architecture. Adaptation methods including cloud-server relay analytics maintain the OS-level virtual environment within these applications.[5]

Solaris partially overcomes the limitation described above with its branded zones feature, which provides the ability to run an environment within a container that emulates an older Solaris 8 or 9 version in a Solaris 10 host. Linux branded zones (referred to as "lx" branded zones) are also available on x86-based Solaris systems, providing a complete Linux user space and support for the execution of Linux applications; additionally, Solaris provides utilities needed to install Red Hat Enterprise Linux 3.x or CentOS 3.x Linux distributions inside "lx" zones.[6] [7] However, in 2010 Linux branded zones were removed from Solaris; in 2014 they were reintroduced in Illumos, which is the open source Solaris fork, supporting 32-bit Linux kernels.[8]

Storage

Some implementations provide file-level copy-on-write (CoW) mechanisms. (Most commonly, a standard file system is shared between partitions, and those partitions that change the files automatically create their own copies.) This is easier to back up, more space-efficient and simpler to cache than the block-level copy-on-write schemes common on whole-system virtualizers. Whole-system virtualizers, however, can work with non-native file systems and create and roll back snapshots of the entire system state.

Implementations

MechanismOperating systemLicenseActively developed since or betweenFeatures
File system isolationCopy on writeDisk quotasI/O rate limitingMemory limitsCPU quotasNetwork isolationNested virtualizationPartition checkpointing and live migrationRoot privilege isolation
chrootMost UNIX-like operating systemsVaries by operating system1982
DockerLinux,[9] Windows x64[10] macOS[11] 2013
Linux-VServer
(security context)
Linux, Windows Server 20162001
lmctfyLinux20132015
LXCLinux2008[12]
SingularityLinux2015[13] [14] [15]
OpenVZLinux2005[16]
VirtuozzoLinux, Windows2000[17]
Solaris Containers (Zones)illumos (OpenSolaris),
Solaris
,
Proprietary
2004 (ZFS)[18] [19]
FreeBSD jailFreeBSD, DragonFly BSD2000[20] (ZFS)[21] [22] [23] [24] [25]
vkernelDragonFly BSD2006[26] [27] [28] [29]
sysjailOpenBSD, NetBSD2006 - 2009
WPARsAIX2007[30]
iCore Virtual AccountsWindows XP2008
Windows2004
systemd-nspawnLinux2010[31] [32]
TurboWindows2012
rkt (rocket)Linux2014[33]  - 2018

Linux containers not listed above include:

See also

External links

Notes and References

  1. Web site: Software containers: Used more frequently than most realize . Hogg . Scott . 2014-05-26 . . Network world, Inc. . 2015-07-09 . There are many other OS-level virtualization systems such as: Linux OpenVZ, Linux-VServer, FreeBSD Jails, AIX Workload Partitions (WPARs), HP-UX Containers (SRP), Solaris Containers, among others. .
  2. Web site: Namespaces and Cgroups, the basis of Linux Containers. Rosen. Rami. 18 August 2016.
  3. Web site: 2022-10-20 . Secure Bottlerocket deployments on Amazon EKS with KubeArmor Containers . 2023-06-20 . aws.amazon.com . en-US.
  4. Book: Mastering FreeBSD and OpenBSD security . O'Reilly Series . Yanek . Korff . Paco . Hope . Bruce . Potter . O'Reilly Media, Inc. . 2005 . 0596006268 . 59 .
  5. Book: Huang . D. . Proceedings of the 10th Parallel Data Storage Workshop . Experiences in using os-level virtualization for block I/O . 2015. 13–18 . 10.1145/2834976.2834982 . 9781450340083 . 3867190 .
  6. Web site: System administration guide: Oracle Solaris containers-resource management and Oracle Solaris zones, Chapter 16: Introduction to Solaris zones . 2010 . 2014-09-02 . .
  7. Web site: System administration guide: Oracle Solaris containers-resource nanagement and Oracle Solaris zones, Chapter 31: About branded zones and the Linux branded zone . 2010 . 2014-09-02 . .
  8. Web site: The dream is alive! Running Linux containers on an illumos kernel . 2014-09-28 . 2014-10-10 . Bryan Cantrill . slideshare.net .
  9. Web site: Docker drops LXC as default execution environment . InfoQ .
  10. Web site: 9 February 2023 . Install Docker desktop on Windows Docker documentation . Docker .
  11. Web site: Get started with Docker desktop for Mac . December 6, 2019 . Docker documentation.
  12. Web site: Graber . Stéphane . LXC 1.0: Security features [6/10] ]. 12 February 2014 . 1 January 2014 . LXC now has support for user namespaces. [...] LXC is no longer running as root so even if an attacker manages to escape the container, he'd find himself having the privileges of a regular user on the host. .
  13. Web site: Sylabs brings Singularity containers into commercial HPC | Top 500 supercomputer sites . www.top500.org .
  14. Web site: SIF — Containing your containers . www.sylabs.io . 14 March 2018 .
  15. Singularity: Scientific containers for mobility of compute . Gregory M. . Kurtzer . Vanessa . Sochat . Michael W. . Bauer . May 11, 2017 . PLOS ONE . 12 . 5 . e0177459 . 10.1371/journal.pone.0177459 . 28494014 . 5426675 . 2017PLoSO..1277459K . free.
  16. Web site: Bronnikov . Sergey . Comparison on OpenVZ wiki page . OpenVZ Wiki . OpenVZ . 28 December 2018.
  17. Web site: Initial public prerelease of Virtuozzo (named ASPcomplete at that time).
  18. http://www.opensolaris.org/os/project/crossbow/faq/ Network virtualization and resource control (Crossbow) FAQ
  19. Web site: Managing network virtualization and network resources in Oracle® Solaris 11.2 . docs.oracle.com .
  20. Web site: Contain your enthusiasm - Part two: Jails, zones, OpenVZ, and LXC . Jails were first introduced in FreeBSD 4.0 in 2000 .
  21. Web site: Hierarchical resource limits - FreeBSD Wiki . Wiki.freebsd.org . 2012-10-27 . 2014-01-15 .
  22. Web site: Implementing a clonable network stack in the FreeBSD kernel . usenix.org . 2003-06-13 .
  23. Web site: VPS for FreeBSD . 2016-02-20 .
  24. Web site: [Announcement] VPS // OS virtualization // alpha release ]. 31 August 2012 . 2016-02-20 .
  25. Web site: 3.5. Limiting your program's environment . Freebsd.org . 2014-01-15 .
  26. Web site: Matthew Dillon . Matthew Dillon . 2006 . sys/vkernel.h . BSD cross reference . .
  27. Web site: vkd(4) — Virtual kernel disc . . "treats the disk image as copy-on-write." .
  28. Web site: Sascha Wildner . 2007-01-08 . vkernel, vcd, vkd, vke — virtual kernel architecture . DragonFly miscellaneous information manual . DragonFly BSD.
  29. Web site: vkernel, vcd, vkd, vke - virtual kernel architecture . DragonFly On-Line Manual Pages . .
  30. Web site: Live application mobility in AIX 6.1 . June 3, 2008 . www.ibm.com.
  31. Web site: systemd-nspawn . www.freedesktop.org .
  32. Web site: 2.3. Modifying control groups Red Hat Enterprise Linux 7 . Red Hat Customer portal.
  33. Web site: Polvi . Alex . CoreOS is building a container runtime, rkt . https://web.archive.org/web/20190401013449/https://coreos.com/blog/rocket.html . 2019-04-01 . CoreOS Blog . 12 March 2019.
  34. Web site: 2021-02-11 . LXD . linuxcontainers.org .
  35. https://indico.cern.ch/event/757415/contributions/3421994/attachments/1855302/3047064/Podman_Rootless_Containers.pdf Rootless containers with Podman and fuse-overlayfs
  36. Web site: Overview — Charliecloud 0.25 documentation . 4 October 2020 .
  37. Web site: Home . katacontainers.io.
  38. Web site: Bottlerocket is a Linux-based operating system purpose-built to run containers .