Loading AI tools
Process of writing a self-compiling compiler From Wikipedia, the free encyclopedia
In computer science, bootstrapping is the technique for producing a self-compiling compiler – that is, a compiler (or assembler) written in the source programming language that it intends to compile. An initial core version of the compiler (the bootstrap compiler) is generated in a different language (which could be assembly language); successive expanded versions of the compiler are developed using this minimal subset of the language. The problem of compiling a self-compiling compiler has been called the chicken-or-egg problem in compiler design, and bootstrapping is a solution to this problem.[1][2]
Bootstrapping is a fairly common practice when creating a programming language. Many compilers for many programming languages are bootstrapped, including compilers for BASIC, ALGOL, C, C#, D, Pascal, PL/I, Haskell, Modula-2, Oberon, OCaml, Common Lisp, Scheme, Go, Java, Elixir, Rust, Python, Scala, Nim, Eiffel, TypeScript, Vala, Zig and more.
A typical bootstrap process works in three or four stages:[3][4][5]
The full compiler is built twice in order to compare the outputs of the two stages. If they are different, either the bootstrap or the full compiler contains a bug.[3]
Bootstrapping a compiler has the following advantages:[6]
Note that some of these points assume that the language runtime is also written in the same language.
If one needs to compile a compiler for language X written in language X, there is the issue of how the first compiler can be compiled. The different methods that are used in practice include:
Methods for distributing compilers in source code include providing a portable bytecode version of the compiler, so as to bootstrap the process of compiling the compiler with itself. The T-diagram is a notation used to explain these compiler bootstrap techniques.[6] In some cases, the most convenient way to get a complicated compiler running on a system that has little or no software on it involves a series of ever more sophisticated assemblers and compilers.[8]
Assemblers were the first language tools to bootstrap themselves.
The first high-level language to provide such a bootstrap was NELIAC in 1958. The first widely used languages to do so were Burroughs B5000 Algol in 1961 and LISP in 1962.
Hart and Levin wrote a LISP compiler in LISP at MIT in 1962, testing it inside an existing LISP interpreter. Once they had improved the compiler to the point where it could compile its own source code, it was self-hosting.[9]
The compiler as it exists on the standard compiler tape is a machine language program that was obtained by having the S-expression definition of the compiler work on itself through the interpreter.
— AI Memo 39[9]
This technique is only possible when an interpreter already exists for the very same language that is to be compiled. It borrows directly from the notion of running a program on itself as input, which is also used in various proofs in theoretical computer science, such as the variation of the proof that the halting problem is undecidable that uses Rice's Theorem.
Due to security concerns regarding the Trusting Trust Attack (which involves a compiler being maliciously modified to introduce covert backdoors in programs it compiles or even further replicate the malicious modification in future versions of the compiler itself, creating a perpetual cycle of distrust) and various attacks against binary trustworthiness, multiple projects are working to reduce the effort for not only bootstrapping from source but also allowing everyone to verify that source and executable correspond. These include the Bootstrappable builds project[10] and the Reproducible builds project.[11]
Seamless Wikipedia browsing. On steroids.
Every time you click a link to Wikipedia, Wiktionary or Wikiquote in your browser's search results, it will show the modern Wikiwand interface.
Wikiwand extension is a five stars, simple, with minimum permission required to keep your browsing private, safe and transparent.