Unix filesystem explained

In Unix and operating systems inspired by it, the file system is considered a central component of the operating system.[1] It was also one of the first parts of the system to be designed and implemented by Ken Thompson in the first experimental version of Unix, dated 1969.

As in other operating systems, the filesystem provides information storage and retrieval, and one of several forms of interprocess communication, in that the many small programs that traditionally form a Unix system can store information in files so that other programs can read them, although pipes complemented it in this role starting with the Third Edition. Also, the filesystem provides access to other resources through so-called device files that are entry points to terminals, printers, and mice.

The rest of this article uses Unix as a generic name to refer to both the original Unix operating system and its many workalikes.

Principles

The filesystem appears as one rooted tree of directories.[1] Instead of addressing separate volumes such as disk partitions, removable media, and network shares as separate trees (as done in DOS and Windows: each drive has a drive letter that denotes the root of its file system tree), such volumes can be mounted on a directory, causing the volume's file system tree to appear as that directory in the larger tree.[1] The root of the entire tree is denoted /.

In the original Bell Labs Unix, a two-disk setup was customary, where the first disk contained startup programs, while the second contained users' files and programs. This second disk was mounted at the empty directory named usr on the first disk, causing the two disks to appear as one filesystem, with the second disk’s contents viewable at /usr.

Unix directories do not contain files. Instead, they contain the names of files paired with references to so-called inodes, which in turn contain both the file and its metadata (owner, permissions, time of last access, etc., but no name). Multiple names in the file system may refer to the same file, a feature termed a hard link.[1] The mathematical traits of hard links make the file system a limited type of directed acyclic graph, although the directories still form a tree, as they typically may not be hard-linked. (As originally envisioned in 1969, the Unix file system would in fact be used as a general graph with hard links to directories providing navigation, instead of path names.[2])

File types

See main article: Unix file types.

The original Unix file system supported three types of files: ordinary files, directories, and "special files", also termed device files.[1] The Berkeley Software Distribution (BSD) and System V each added a file type to be used for interprocess communication: BSD added sockets,[3] while System V added FIFO files.

BSD also added symbolic links (often termed "symlinks") to the range of file types, which are files that refer to other files, and complement hard links.[3] Symlinks were modeled after a similar feature in Multics,[4] and differ from hard links in that they may span filesystems and that their existence is independent of the target object. Other Unix systems may support additional types of files.

Conventional directory layout

Certain conventions exist for locating some kinds of files, such as programs, system configuration files, and users' home directories. These were first documented in the hier(7) man page since Version 7 Unix; subsequent versions, derivatives and clones typically have a similar man page.[5] [6] [7]

The details of the directory layout have varied over time. Although the file system layout is not part of the Single UNIX Specification, several attempts exist to standardize (parts of) it, such as the System V Application Binary Interface, the Intel Binary Compatibility Standard, the Common Operating System Environment, and Linux Foundation's Filesystem Hierarchy Standard (FHS).[8]

Here is a generalized overview of common locations of files on a Unix operating system:

Directory or fileDescription
/The slash / character alone denotes the root of the filesystem tree.

/bin

Stands for binaries and contains certain fundamental utilities, such as ls or cp, that are needed to mount /usr, when that is a separate filesystem, or to run in one-user (administrative) mode when /usr cannot be mounted. In System V.4, this is a symlink to /usr/bin. Otherwise, it needs to be on the root filesystem itself.

[[/boot]]

Contains all the files needed for successful booting process. In Research Unix, this was one file rather than a directory. Nowadays usually on the root filesystem itself, unless the system, bootloader etc. require otherwise.

[[/dev]]

Stands for devices. Contains file representations of peripheral devices and pseudo-devices. See also: Linux Assigned Names and Numbers Authority. Needs to be on the root filesystem itself.

/etc

Contains system-wide configuration files and system databases; the name stands for et cetera[9] but now a better expansion is editable-text-configurations. Originally also contained "dangerous maintenance utilities" such as init, but these have typically been moved to /sbin or elsewhere. Needs to be on the root filesystem itself.

/home

Contains user home directories on Linux and some other systems. In the original version of Unix, home directories were in /usr instead.[10] Some systems use or have used different locations still: macOS has home directories in /Users, older versions of BSD put them in /u, FreeBSD has /usr/home.

/lib

Originally essential libraries: C libraries, but not Fortran ones. On modern systems, it contains the shared libraries needed by programs in /bin, and possibly loadable kernel module or device drivers. Linux distributions may have variants /lib32 and /lib64 for multi-architecture support.

/media

Default mount point for removable devices, such as USB sticks, media players, etc. By common sense, the directory itself, whose subdirectories are mountpoints, is on the root partition itself.

/mnt

Stands for mount. Empty directory commonly used by system administrators as a temporary mount point. By common sense, the directory itself, whose subdirectories are mountpoints, is on the root partition itself.

/opt

Contains locally installed software. Originated in System V, which has a package manager that installs software to this directory (one subdirectory per package).[11]

/proc

procfs virtual filesystem showing information about processes as files.

/root

The home directory for the superuser root - that is, the system administrator. This account's home directory is usually on the initial filesystem, and hence not in /home (which may be a mount point for another filesystem) in case specific maintenance needs to be performed, during which other filesystems are not available. Such a case could occur, for example, if a hard disk drive suffers failures and cannot be properly mounted.

/sbin

Stands for "system (or superuser) binaries" and contains fundamental utilities, such as init, usually needed to start, maintain and recover the system. Needs to be on the root partition itself.

/srv

Server data (data for services provided by system).

/sys

In some Linux distributions, contains a sysfs virtual filesystem, containing information related to hardware and the operating system. On BSD systems, commonly a symlink to the kernel sources in /usr/src/sys.

/tmp

A place for temporary files not expected to survive a reboot. Many systems clear this directory upon startup or use tmpfs to implement it.

/unix

The Unix kernel in Research Unix and System V. With the addition of virtual memory support to 3BSD, this got renamed /vmunix.

/usr

The "user file system": originally the directory holding user home directories, but already by the Third Edition of Research Unix, ca. 1973, reused to split the operating system's programs over two disks (one of them a 256K fixed-head drive) so that basic commands would either appear in /bin or /usr/bin.[12] It now holds executables, libraries, and shared resources that are not system critical, such as the X Window System, window managers, scripting languages, etc. In older Unix systems, user home directories might still appear in /usr alongside directories containing programs, although by 1984 this depended on local customs.

/usr/include

Stores the development headers used throughout the system. Header files are mostly used by the #include directive in C language, which historically is how the name of this directory was chosen.

/usr/lib

Stores the needed libraries and data files for programs stored within /usr or elsewhere.

/usr/libexec

Holds programs meant to be executed by other programs rather than by users directly. E.g., the Sendmail executable may be found in this directory.[13] Not present in the FHS until 2011;[14] Linux distributions have traditionally moved the contents of this directory into /usr/lib, where they also resided in 4.3BSD.

/usr/local

Resembles /usr in structure, but its subdirectories are used for additions not part of the operating system distribution, such as custom programs or files from a BSD Ports collection. Usually has subdirectories such as /usr/local/lib or /usr/local/bin.

/usr/share

Architecture-independent program data. On Linux and modern BSD derivatives, this directory has subdirectories such as man for manpages, that used to appear directly under /usr in older versions.

/var

Stands for variable. A place for files that might change frequently - especially in size, for example e-mail sent to users on the system, or process-ID lock files.

/var/log

Contains system log files.

/var/mail

The place where all incoming mail is stored. Users (other than root) can access their own mail only. Often, this directory is a symbolic link to /var/spool/mail.

/var/spool

Spool directory. Contains print jobs, mail spools and other queued tasks.

/var/src

The place where the uncompiled source code of some programs is.

/var/tmp

The /var/tmp directory is a place for temporary files which should be preserved between system reboots.

See also

Notes and References

  1. Ritchie . D.M. . Dennis Ritchie . Thompson . K. . Ken Thompson . The UNIX Time-Sharing System . Bell System Tech. J. . 57 . 6 . 1905–1929 . July 1978 . 10.1002/j.1538-7305.1978.tb02136.x. 10.1.1.112.595 .
  2. Dennis M. . Ritchie . 1979 . The Evolution of the Unix Time-sharing System . Language Design and Programming Methodology Conf. .
  3. Book: Leffler . Samuel J. . Samuel J Leffler . McKusick . Marshall Kirk . Marshall Kirk McKusick . Karels . Michael J. . Michael J. Karels . Quarterman . John S. . John Quarterman . The Design and Implementation of the 4.3BSD UNIX Operating System . October 1989 . . 978-0-201-06196-3 . registration .
  4. Web site: McKusick. etal. Marshall Kirk. A Fast Filesystem for Unix. Freebsd.org. CSRG, UC Berkeley. 16 November 2016.
  5. Web site: hier(7) man page for 2.9.1 BSD.
  6. Web site: hier(7) man page for ULTRIX 4.2.
  7. Web site: hier(7) man page for SunOS 4.1.3.
  8. Web site: Where to Install My Products on Linux? . . 1 November 2000 . George Kraft IV . 13 November 2014.
  9. Book: Brian W. . Kernighan . Brian Kernighan . Rob . Pike . Rob Pike . The UNIX Programming Environment . Prentice-Hall . 1984 . 63–65. The UNIX Programming Environment . 1984upe..book.....K .
  10. Web site: Ritchie. Dennis. Unix Notes from 1972. 14 January 2018.
  11. System V Application Binary Interface Edition 4.1 (1997-03-18)
  12. Web site: Doug McIlroy . M. D. McIlroy . 1987 . A Research Unix reader: annotated excerpts from the Programmer's Manual, 1971–1986 . CSTR 139 . Bell Labs.
  13. Web site: Chapter 7. sendmail . UNICOS/mp Networking Facilities Administration . . 14 September 2013.
  14. Web site: fhs-spec revision 44.