This is a list of notable lexer generators and parser generators for various language classes.
Regular languages are a category of languages (sometimes termed Chomsky Type 3) which can be matched by a state machine (more specifically, by a deterministic finite automaton or a nondeterministic finite automaton) constructed from a regular expression. In particular, a regular language can match constructs like "A follows B", "Either A or B", "A, followed by zero or more instances of B", but cannot match constructs which require consistency between non-adjacent elements, such as "some instances of A followed by the same number of instances of B", and also cannot express the concept of recursive "nesting" ("every A is eventually followed by a matching B"). A classic example of a problem which a regular grammar cannot handle is the question of whether a given string contains correctly nested parentheses. (This is typically handled by a Chomsky Type 2 grammar, also termed a context-free grammar.)
Name | Lexer algorithm | Grammar, code | Development platform | License | ||
---|---|---|---|---|---|---|
Alex | , BSD | |||||
AnnoFlex | , BSD | |||||
Astir | DFA table driven, with branching | , MIT | ||||
AustenX | , BSD | |||||
C# Flex | , GNU GPL | |||||
C# Lex | ? | |||||
CookCC | , Apache 2.0 | |||||
DFA | DFA compressed matrix | Windows, Visual Studio | BSD | |||
Dolphin | ||||||
DFA table driven | , BSD | |||||
gelex | , MIT | |||||
golex | , BSD-style | |||||
gplex | , BSD-like | |||||
JFlex | , BSD | |||||
JLex | , BSD-like | |||||
, proprietary, CDDL | ||||||
lexertl | ? | , GNU LGPL | ||||
Quex | DFA direct code | , GNU LGPL | ||||
, GNU GPL, MIT[1] [2] | ||||||
DFA direct code, DFA table driven, and NFA regex libraries | , BSD | |||||
DFA direct code | , public domain |
Context-free languages are a category of languages (sometimes termed Chomsky Type 2) which can be matched by a sequence of replacement rules, each of which essentially maps each non-terminal element to a sequence of terminal elements and/or other nonterminal elements. Grammars of this type can match anything that can be matched by a regular grammar, and furthermore, can handle the concept of recursive "nesting" ("every A is eventually followed by a matching B"), such as the question of whether a given string contains correctly nested parentheses. The rules of Context-free grammars are purely local, however, and therefore cannot handle questions that require non-local analysis such as "Does a declaration exist for every variable that is used in a function?". To do so technically would require a more sophisticated grammar, like a Chomsky Type 1 grammar, also termed a context-sensitive grammar. However, parser generators for context-free grammars often support the ability for user-written code to introduce limited amounts of context-sensitivity. (For example, upon encountering a variable declaration, user-written code could save the name and type of the variable into an external data structure, so that these could be checked against later variable references detected by the parser.)
The deterministic context-free languages are a proper subset of the context-free languages which can be efficiently parsed by deterministic pushdown automata.
Name | Parsing algorithm | Input grammar notation | Grammar, code | Development platform | License | ||||
---|---|---|---|---|---|---|---|---|---|
ANTLR4 | Adaptive LL(*)[3] | generated | , BSD | ||||||
ANTLR3 | LL(*) | generated | , BSD | ||||||
APG[4] | none | , BSD | |||||||
Beaver[5] [6] | LALR(1) | EBNF | Java | external | , BSD | ||||
external | , GNU GPL with exception | ||||||||
? | external | , public domain | |||||||
LALR(1) | external | , public domain | |||||||
CL-Yacc[7] [8] | LALR(1) | Lisp | Common Lisp | external | , MIT | ||||
LL(1) | generated | , GNU GPL | |||||||
CppCC[9] [10] | LL(k) | ? | C++ | generated | , GNU GPL | ||||
CUP[11] [12] | LALR(1) | ? | Java | external | , BSD-like | ||||
Eli[13] [14] | LALR(1) | ? | generated | , GNU GPL, GNU LGPL | |||||
Essence[15] | LR(?) | ? | Scheme 48 | external | , BSD | ||||
eyapp[16] | LALR(1) | ? | Perl | external or generated | , Artistic | ||||
GOLD[17] | LALR(1) | BNF | x86 assembly language, ANSI C, C#, D, Java, Pascal, Object Pascal, Python, Visual Basic 6, Visual Basic .NET, Visual C++ | generated | , zlib modified | ||||
Hime Parser Generator[18] | LALR(1), GLR | BNF dialect | C#, Java, Rust | generated | , GNU LGPL | ||||
Hyacc[19] | LR(1), LALR(1), LR(0) | Yacc | C | external | , GNU GPL | ||||
JavaCC[20] [21] | LL(k) | EBNF | Java, C++, JavaScript (via GWT compiler)[22] | generated | , BSD | ||||
LL(1), LALR(1) | ? | ? | ? | ? | |||||
LL(k) | ? | generated | , GNU GPL | ||||||
JS/CC | LALR(1) | internal | , BSD | ||||||
KDevelop-PG-Qt | ? | generated or external | , GNU LGPL | ||||||
Kelbt | Backtracking LALR(1) | ? | generated | , GNU GPL | |||||
kmyacc | LALR(1) | ? | external | , GNU GPL | |||||
Lapg | LALR(1) | ? | generated | , GNU GPL | |||||
Lark | generated | , MIT | |||||||
LALR(1) | BNF dialect[23] | external | , public domain | ||||||
Lezer[24] [25] [26] | EBNF dialect | generated | , MIT | ||||||
Lime | LALR(1) | ? | external | , GNU GPL | |||||
LISA | LR(?), LL(?), LALR(?), SLR(?) | ? | generated | , public domain | |||||
LLgen | LL(1) | ? | external | , BSD | |||||
LLnextgen | LL(1) | ? | external | , GNU GPL | |||||
LLLPG | LL(k) + syntactic and semantic predicates | ANTLR-like | generated (?) | , GNU LGPL | |||||
LPG | Backtracking LALR(k) | ? | generated | , EPL | |||||
LRSTAR[27] | LALR(1), LALR(*) | YACC, ANTLR, EBNF | generated | , BSD | |||||
Menhir | LR(1) | ? | generated | , QPL | |||||
ML-Yacc | LALR(1) | ? | external | ? | |||||
Monkey | LR(1) | ? | generated | , GNU GPL | |||||
Msta | LALR(k), LR(k) | external or generated | , GNU GPL | ||||||
MTP (More Than Parsing) | LL(1) | ? | generated | , GNU GPL | |||||
MyParser | LL(*) | internal | , MIT | ||||||
NLT | C#/BNF-like | mixed | , MIT | ||||||
ocamlyacc | LALR(1) | ? | external | , QPL | |||||
olex | LL(1) | ? | generated | , GNU GPL | |||||
Parsec | none | , BSD | |||||||
yapp | LALR(1) | ? | external | , GNU GPL | |||||
Parser Objects | LL(k) | ? | ? | , zlib | |||||
PCCTS | ? | ? | ? | ? | |||||
LALR(1) | BNF | generated | , MIT | ||||||
PlyPlus | LALR(1) | EBNF | generated | , MIT | |||||
PRECC | LL(k) | ? | generated | , GNU GPL | |||||
racc[28] | LALR(1) | BNF-like, yacc-like[29] | Ruby | ? | |||||
QLALR | LALR(1) | ? | external | , GNU GPL | |||||
LALR(1) | ? | generated | , GNU LGPL | ||||||
SLK[30] | LL(k) LR(k) LALR(k) | external | SLK[31] | ||||||
SLY[32] | LALR(1) | BNF | generated | , MIT | |||||
SP (Simple Parser) | generated | , GNU LGPL | |||||||
? | internal | , Boost | |||||||
Styx | LALR(1) | ? | generated | , GNU LGPL | |||||
Sweet Parser | LALR(1) | ? | generated | , zlib | |||||
Tap | LL(1) | ? | generated | , GNU GPL | |||||
TextTransformer | LL(k) | ? | generated | ||||||
TinyPG | LL(1) | ? | ? | ? | , CPOL 1.0 | ||||
? | generated | , GNU LGPL | |||||||
TP Yacc | LALR(1) | ? | external | , GNU GPL | |||||
Tree-Sitter[33] | C, bindings (Rust, WebAssembly, JavaScript, Python, many other) | generated + external | , MIT | ||||||
Tunnel Grammar Studio | generated | ||||||||
UltraGram | C++, Java, C#, Visual Basic .NET | external | , public domain | ||||||
UniCC | LALR(1) | generated | , BSD | ||||||
LL(1) | ? | Java | ? | generated | ? | ||||
LALR(1) | external | , CPL & CDDL | |||||||
Yacc++ | LR(1), LALR(1) | generated or external | |||||||
Yapps | LL(1) | ? | generated | , MIT | |||||
yecc | LALR(1) | ? | generated | , Apache 2.0 | |||||
LR(1), LALR(1) | ? | generated | |||||||
YooParse | LR(1), LALR(1) | ? | external | , MIT | |||||
Parse[34] | LR(1) | BNF in C++ types | ? | ? | none | , MIT | |||
GGLL | LL(1) | Graph | generated | , MIT | |||||
Product | Parsing algorithm | Input grammar notation | Grammar, code | Development platform | License |
This table compares parser generators with parsing expression grammars, deterministic boolean grammars.
Name | Parsing algorithm | Grammar, code | Development platform | License | ||
---|---|---|---|---|---|---|
AustenX | Packrat (modified) | , BSD | ||||
Aurochs | , GNU GPL | |||||
BNFlite | Recursive descent | , MIT | ||||
Canopy | , GNU GPL | |||||
CL-peg | , MIT | |||||
Drat! | , GNU GPL | |||||
Frisby | , BSD | |||||
, BSD | ||||||
Grako | Packrat + Cut + Left Recursion | Python, C++ (beta) | , BSD | |||
IronMeta | , BSD | |||||
Laja | 2-phase scannerless top-down backtracking + runtime support | , GNU GPL | ||||
lars::Parser | Packrat (supporting left-recursion and grammar ambiguity) | Identical | , BSD | |||
LPeg | Parsing machine | , MIT | ||||
lug | Parsing machine | , MIT | ||||
Mouse | Recursive descent (modified, limited memoization and left-recursion) | , Apache 2.0 | ||||
Narwhal | , BSD | |||||
Nearley | , MIT | |||||
Nemerle.Peg | Recursive descent + Pratt | , BSD | ||||
neotoma | , MIT | |||||
NPEG | Recursive descent | , MIT | ||||
Packrat (modified, partial memoization) | , MIT | |||||
PackCC | Packrat (modified, left-recursion support) | , MIT | ||||
, MIT | ||||||
, BSD | ||||||
Recursive descent | , Apache 2.0 | |||||
Lambda PEG | Recursive descent | , Apache 2.0 | ||||
parsepp | Recursive descent | , public domain | ||||
Parsnip | , GNU GPL | |||||
Patterns | Parsing machine | Identical | , MIT | |||
peg | Recursive descent | , MIT | ||||
PEG.js | Packrat (partial memoization) | , MIT | ||||
Peggy[35] | Packrat (partial memoization) | , MIT | ||||
Pegasus | Recursive descent, Packrat (selectively) | , MIT | ||||
pegc | Recursive descent | , public domain | ||||
pest | Recursive descent | , MIT, Apache 2.0 | ||||
PetitParser | , MIT | |||||
PEGTL | Recursive descent | , Boost | ||||
Parser Grammar Engine (PGE) | Hybrid recursive descent / operator precedence[36] | , Artistic 2.0 | ||||
PyPy rlib | , MIT | |||||
Rats! | , GNU LGPL | |||||
Spirit2 | Recursive descent | , Boost | ||||
Treetop | Recursive descent | , MIT | ||||
Yard | Recursive descent | , MIT or public domain | ||||
Waxeye | Parsing machine | , MIT | ||||
PHP PEG | PEG Parser? | , BSD |
This table compares parser generator languages with a general context-free grammar, a conjunctive grammar, or a boolean grammar.
Name | Parsing algorithm | Input grammar notation | Output languages ! | Grammar, code | Development platform | License | |||
---|---|---|---|---|---|---|---|---|---|
ACCENT | Yacc variant | external | , GNU GPL | ||||||
APaGeD | GLR, LALR(1), LL(k) | ? | generated | , Artistic | |||||
, except XML | external | , GNU GPL | |||||||
? | generated | ||||||||
DParser | ? | , BSD | |||||||
Dypgen | ? | generated | , CeCILL-B | ||||||
E3 | ? | external, or scannerless | ? | ||||||
Elkhound | ? | external | , BSD | ||||||
GDK | ? | generated | , MIT | ||||||
Happy | ? | external | , BSD | ||||||
Hime Parser Generator | ? | generated | , GNU LGPL | ||||||
IronText Library | generated or external | , Apache 2.0 | |||||||
Jison | LALR(1), LR(0), SLR(1) | generated | , MIT | ||||||
Syntax | LALR(1), LR(0), SLR(1) CLR(1) LL(1) | generated | , MIT | ||||||
Laja | Scannerless, two phase | Laja | , GNU GPL | ||||||
ModelCC | Annotated class model | Generated | generated | , BSD | |||||
P3 | Earley–combinators | BNF-like | external, or scannerless | ? | |||||
P4 | Earley–combinators, infinitary CFGs | BNF-like | external, or scannerless | ? | |||||
Scannerless GLR (Boolean grammars) | ? | , BSD | |||||||
SDF/SGLR | , BSD | ||||||||
SmaCC | GLR(1), LALR(1), LR(1) | ? | internal | , MIT | |||||
SPARK | ? | external | , MIT | ||||||
Tom | ? | Generated | none | , "No licensing or copyright restrictions" | |||||
UltraGram | ? | generated | |||||||
Wormhole | ? | , MIT | |||||||
Whale Calf | General tabular, SLL(k), Linear normal form (conjunctive grammars), LR, Binary normal form (Boolean grammars) | ? | external | ||||||
yaep | Yacc-like | external | , GNU LGPL |
This table compares parser generators with context-sensitive grammars.