Bootstrapping (compilers) explained

In computer science, bootstrapping is the technique for producing a self-compiling compiler – that is, a compiler (or assembler) written in the source programming language that it intends to compile. An initial core version of the compiler (the bootstrap compiler) is generated in a different language (which could be assembly language); successive expanded versions of the compiler are developed using this minimal subset of the language. The problem of compiling a self-compiling compiler has been called the chicken-or-egg problem in compiler design, and bootstrapping is a solution to this problem.[1] [2]

Bootstrapping is a fairly common practice when creating a programming language. Many compilers for many programming languages are bootstrapped, including compilers for BASIC, ALGOL, C, C#, D, Pascal, PL/I, Haskell, Modula-2, Oberon, OCaml, Common Lisp, Scheme, Go, Java, Elixir, Rust, Python, Scala, Nim, Eiffel, TypeScript, Vala, Zig and more.

Process

A typical bootstrap process works in three or four stages:[3] [4] [5]

The full compiler is built twice in order to compare the outputs of the two stages. If they are different, either the bootstrap or the full compiler contains a bug.[3]

Advantages

Bootstrapping a compiler has the following advantages:[6]

Note that some of these points assume that the language runtime is also written in the same language.

Methods

If one needs to compile a compiler for language X written in language X, there is the issue of how the first compiler can be compiled. The different methods that are used in practice include:

Methods for distributing compilers in source code include providing a portable bytecode version of the compiler, so as to bootstrap the process of compiling the compiler with itself. The T-diagram is a notation used to explain these compiler bootstrap techniques.[6] In some cases, the most convenient way to get a complicated compiler running on a system that has little or no software on it involves a series of ever more sophisticated assemblers and compilers.[8]

History

See main article: History of compiler construction.

Assemblers were the first language tools to bootstrap themselves.

The first high-level language to provide such a bootstrap was NELIAC in 1958. The first widely used languages to do so were Burroughs B5000 Algol in 1961 and LISP in 1962.

Hart and Levin wrote a LISP compiler in LISP at MIT in 1962, testing it inside an existing LISP interpreter. Once they had improved the compiler to the point where it could compile its own source code, it was self-hosting.

This technique is only possible when an interpreter already exists for the very same language that is to be compiled. It borrows directly from the notion of running a program on itself as input, which is also used in various proofs in theoretical computer science, such as the variation of the proof that the halting problem is undecidable that uses Rice's Theorem.

Current efforts

Due to security concerns regarding the Trusting Trust Attack (which involves a compiler being maliciously modified to introduce covert backdoors in programs it compiles or even further replicate the malicious modification in future versions of the compiler itself, creating a perpetual cycle of distrust) and various attacks against binary trustworthiness, multiple projects are working to reduce the effort for not only bootstrapping from source but also allowing everyone to verify that source and executable correspond. These include the Bootstrappable builds project[9] and the Reproducible builds project.[10]

See also

References

  1. Reynolds. John H.. December 2003. CCSC: Eastern Conference. 2. Journal of Computing Sciences in Colleges. 175–181. Bootstrapping a self-compiling compiler from machine X to machine Y. 19. The idea of a compiler written in the language it compiles stirs up the old 'chicken-or-the-egg' conundrum: Where does the first one come from?.
  2. Glück. Robert. Clarke. Edmund. Virbitskaite. Irina. Voronkov. Andrei. Bootstrapping compiler generators from partial evaluators. 10.1007/978-3-642-29709-0_13. 125–141. Springer. Lecture Notes in Computer Science. Perspectives of Systems Informatics: 8th International Andrei Ershov Memorial Conference, PSI 2011, Novosibirsk, Russia, June 27 – July 1, 2011, Revised Selected Papers. 7162. 2012. 978-3-642-29708-3 . Getting started presents the chicken-and-egg problem familiar from compiler construction: one needs a compiler to bootstrap a compiler, and bootstrapping compiler generators is no exception..
  3. Web site: Installing GCC: Building . GNU Project - Free Software Foundation (FSF).
  4. Web site: rust-lang/rust: bootstrap . GitHub . en.
  5. Web site: Advanced Build Configurations — LLVM 10 documentation . llvm.org.
  6. Book: Compilers and Compiler Generators: An Introduction With C++ . Patrick D. Terry . 1997 . International Thomson Computer Press . 1-85032-298-8 . 3. Compiler Construction and Bootstrapping . https://web.archive.org/web/20091123154911/http://www.oopweb.com/Compilers/Documents/Compilers/Volume/cha03s.htm . 2009-11-23 . http://www.oopweb.com/Compilers/Documents/Compilers/Volume/cha03s.htm.
  7. Wirth . Niklaus . 50 years of Pascal . Communications of the ACM . Association for Computing Machinery (ACM) . 64 . 3 . 2021-02-22 . 0001-0782 . 10.1145/3447525 . 39–41. 231991096 .
  8. Web site: Bootstrapping a simple compiler from nothing . Edmund Grimley-Evans. homepage.ntlworld.com . 2003-04-23 . https://web.archive.org/web/20100303235322/http://homepage.ntlworld.com/edmund.grimley-evans/bcompiler.html . 2010-03-03 . dead.
  9. Web site: Bootstrappable builds. bootstrappable.org.
  10. Web site: Reproducible Builds — a set of software development practices that create an independently-verifiable path from source to binary code. reproducible-builds.org.