Multi-pass compiler explained

A multi-pass compiler is a type of compiler that processes the source code or abstract syntax tree of a program several times. This is in contrast to a one-pass compiler, which traverses the program only once. Each pass takes the result of the previous pass as the input, and creates an intermediate output. In this way, the (intermediate) code is improved pass by pass, until the final pass produces the final code.

Multi-pass compilers are sometimes called wide compilers,[1] referring to the greater scope of the passes: they can "see" the entire program being compiled, instead of just a small portion of it. The wider scope thus available to these compilers allows better code generation (e.g. smaller code size, faster code) compared to the output of one-pass compilers, at the cost of higher compiler time and memory consumption. In addition, some languages cannot be compiled in a single pass, as a result of their design.

Typical multi-pass compiler

Lexical analysis

This stage of a multi-pass compiler is to remove irrelevant information from the source program that syntax analysis will not be able to use or interpret. Irrelevant information could include things like comments and white space. In addition to removing the irrelevant information, the lexical analysis determines the lexical tokens of the language. This step means that forward declaration is generally not necessary if a multi-pass compiler is used.This phase is focused on breaking a sequence of characters into tokens with attributes such as kind, type, value, and potentially others, as well.

Syntax analysis

Syntax analysis is responsible for looking at syntax rules of the language (often as a context-free grammar), and building some intermediate representation of the language. An example of this intermediate representation could be something like an abstract syntax tree or a directed acyclic graph.

Semantic analysis

Semantic analysis takes the representation made from syntax analysis and applies semantic rules to the representation to make sure that the program meets the semantic rules requirements of the language. For example, in the example below at the stage of semantic analysis if the language required that conditions on if statements were boolean expressions the cond would be type-checked to make sure it would be a valid boolean expression.

if (cond) else

In addition to performing semantic analysis at this stage of compilation, often symbol tables are created in order to assist in code generation.

Code generation

This final stage of a typical compiler converts the intermediate representation of program into an executable set of instructions (often assembly). This last stage is the only stage in compilation that is machine dependent. There can also be optimization done at this stage of compilation that make the program more efficient.

Other passes of compiler include intermediate code generation phase which takes place before code generation and code optimization phase which can take place when the source program is written, or after intermediate code generation phase, or after code generation phase.

Advantages of multi-pass compilers

Machine Independent: Since the multiple passes include a modular structure, and the code generation decoupled from the other steps of the compiler, the passes can be reused for different hardware/machines.

More Expressive Languages: Multiple passes obviate the need for forward declarations, allowing mutual recursion to be implemented elegantly. The prime examples of languages requiring forward declarations due to the requirement of being compilable in a single pass include C and Pascal, whereas Java does not have forward declarations.

References

Notes and References

  1. Book: Grune . Dick . van Reeuwijk . Kees . Bal . Henri . Jacobs . Ceriel . Langendoen . Koen . Modern Compiler Design . 2012 . Springer . Amsterdam, the Netherlands . 978-1-4939-4472-9 . 27 . Second.