GNU Bison explained

GNU Bison
Logo Size:100px
Author:Robert Corbett
Developer:The GNU Project
Programming Language:C and m4
Operating System:Unix-like
Genre:Parser generator
License:GPL

GNU Bison, commonly known as Bison, is a parser generator that is part of the GNU Project. Bison reads a specification in Bison syntax (described as "machine-readable BNF"[1]), warns about any parsing ambiguities, and generates a parser that reads sequences of tokens and decides whether the sequence conforms to the syntax specified by the grammar.

The generated parsers are portable: they do not require any specific compilers. Bison by default generates LALR(1) parsers but it can also generate canonical LR, IELR(1) and GLR parsers.[2]

In POSIX mode, Bison is compatible with Yacc, but also has several extensions over this earlier program, including

Flex, an automatic lexical analyser, is often used with Bison, to tokenise input data and provide Bison with tokens.[3]

Bison was originally written by Robert Corbett in 1985.[4] Later, in 1989, Robert Corbett released another parser generator named Berkeley Yacc. Bison was made Yacc-compatible by Richard Stallman.[5]

Bison is free software and is available under the GNU General Public License, with an exception (discussed below) allowing its generated code to be used without triggering the copyleft requirements of the licence.

Features

Counterexample generation

One delicate issue with LR parser generators is the resolution of conflicts (shift/reduce and reduce/reduce conflicts). With many LR parser generators, resolving conflicts requires the analysis of the parser automaton, which demands some expertise from the user.

To aid the user in understanding conflicts more intuitively, Bison can instead automatically generate counterexamples. For ambiguous grammars, Bison often can even produce counterexamples that show the grammar is ambiguous.

For instance, on a grammar suffering from the infamous dangling else problem, Bison reports

Reentrancy

Reentrancy is a feature which has been added to Bison and does not exist in Yacc.

Normally, Bison generates a parser which is not reentrant. In order to achieve reentrancy the declaration %define api.pure must be used. More details on Bison reentrancy can be found in the Bison manual.[6]

Output languages

Bison can generate code for C, C++, D and Java.[7]

For using the Bison-generated parser from other languages a language binding tool such as SWIG can be used.

License and distribution of generated code

Because Bison generates source code that in turn gets added to the source code of other software projects, it raises some simple but interesting copyright questions.

A GPL-compatible license is not required

The code generated by Bison includes significant amounts of code from the Bison project itself. The Bison package is distributed under the terms of the GNU General Public License (GPL) but an exception has been added so that the GPL does not apply to output.[8] [9]

Earlier releases of Bison stipulated that parts of its output were also licensed under the GPL, due to the inclusion of the yyparse function from the original source code in the output.

Distribution of packages using Bison

Free software projects that use Bison may have a choice of whether to distribute the source code which their project feeds into Bison, or the resulting C code made output by Bison. Both are sufficient for a recipient to be able to compile the project source code. However, distributing only the input carries the minor inconvenience that the recipients must have a compatible copy of Bison installed so that they can generate the necessary C code when compiling the project. And distributing only the C code in output, creates the problem of making it very difficult for the recipients to modify the parser since this code was written neither by a human nor for humans - its purpose is to be fed directly into a C compiler.

These problems can be avoided by distributing both the input files and the generated code. Most people will compile using the generated code, no different from any other software package, but anyone who wants to modify the parser component can modify the input files first and re-generate the generated files before compiling. Projects distributing both usually do not have the generated files in their revision control systems. The files are only generated when making a release.

Some licenses, such as the GPL, require that the source code be in "the preferred form of the work for making modifications to it". GPL'd projects using Bison must thus distribute the files which are the input for Bison. Of course, they can also include the generated files.

Use

Because Bison was written as a replacement for Yacc, and is largely compatible, the code from a lot of projects using Bison could equally be fed into Yacc. This makes it difficult to determine if a project "uses" Bison-specific source code or not. In many cases, the "use" of Bison could be trivially replaced by the equivalent use of Yacc or one of its other derivatives.

Bison has features not found in Yacc, so some projects can be truly said to "use" Bison, since Yacc would not suffice.

The following list is of projects which are known to "use" Bison in the looser sense, that they use free software development tools and distribute code which is intended to be fed into Bison or a Bison-compatible package.

A complete reentrant parser example

The following example shows how to use Bison and flex to write a simple calculator program (only addition and multiplication) and a program for creating an abstract syntax tree. The next two files provide definition and implementation of the syntax tree functions.

/* * Expression.h * Definition of the structure used to build the syntax tree. */

  1. ifndef __EXPRESSION_H__
  2. define __EXPRESSION_H__

/** * @brief The operation type */typedef enum tagEOperationType EOperationType;

/** * @brief The expression structure */typedef struct tagSExpression SExpression;

/** * @brief It creates an identifier * @param value The number value * @return The expression or NULL in case of no memory */SExpression *createNumber(int value);

/** * @brief It creates an operation * @param type The operation type * @param left The left operand * @param right The right operand * @return The expression or NULL in case of no memory */SExpression *createOperation(EOperationType type, SExpression *left, SExpression *right);

/** * @brief Deletes a expression * @param b The expression */void deleteExpression(SExpression *b);

  1. endif /* __EXPRESSION_H__ */

/* * Expression.c * Implementation of functions used to build the syntax tree. */

  1. include "Expression.h"
  2. include

/** * @brief Allocates space for expression * @return The expression or NULL if not enough memory */static SExpression *allocateExpression

SExpression *createNumber(int value)

SExpression *createOperation(EOperationType type, SExpression *left, SExpression *right)

void deleteExpression(SExpression *b)

The tokens needed by the Bison parser will be generated using flex.

%

%option outfile="Lexer.c" header-file="Lexer.h"%option warn nodefault

%option reentrant noyywrap never-interactive nounistd%option bison-bridge

%%

[\r\n\t]

[0-9]+

"*" "+" "(" ")"

.

%%

int yyerror(SExpression **expression, yyscan_t scanner, const char *msg)

The names of the tokens are typically neutral: "TOKEN_PLUS" and "TOKEN_STAR", not "TOKEN_ADD" and "TOKEN_MULTIPLY". For instance if we were to support the unary "+" (as in "+1"), it would be wrong to name this "+" "TOKEN_ADD". In a language such as C, "int *ptr" denotes the definition of a pointer, not a product: it would be wrong to name this "*" "TOKEN_MULTIPLY".

Since the tokens are provided by flex we must provide the means to communicate between the parser and the lexer.[22] The data type used for communication, YYSTYPE, is set using Bison %union declaration.

Since in this sample we use the reentrant version of both flex and yacc we are forced to provide parameters for the yylex function, when called from yyparse. This is done through Bison %lex-param and %parse-param declarations.[23]

%

%code requires

%output "Parser.c"%defines "Parser.h"

%define api.pure%lex-param %parse-param %parse-param

%union

%token TOKEN_LPAREN "("%token TOKEN_RPAREN ")"%token TOKEN_PLUS "+"%token TOKEN_STAR "*"%token TOKEN_NUMBER "number"

%type expr

/* Precedence (increasing) and associativity: a+b+c is (a+b)+c: left associativity a+b*c is a+(b*c): the precedence of "*" is higher than that of "+". */%left "+"%left "*"

%%

input : expr ;

expr : expr[L] "+" expr[R] | expr[L] "*" expr[R] | "(" expr[E] ")" | "number" ;

%%

The code needed to obtain the syntax tree using the parser generated by Bison and the scanner generated by flex is the following.

/* * main.c file */

  1. include "Expression.h"
  2. include "Parser.h"
  3. include "Lexer.h"
  4. include

int yyparse(SExpression **expression, yyscan_t scanner);

SExpression *getAST(const char *expr)

int evaluate(SExpression *e)

int main(void)

A simple makefile to build the project is the following.

  1. Makefile

FILES = Lexer.c Parser.c Expression.c main.cCC = g++CFLAGS = -g -ansi

test: $(FILES) $(CC) $(CFLAGS) $(FILES) -o test

Lexer.c: Lexer.l flex Lexer.l

Parser.c: Parser.y Lexer.c bison Parser.y

clean: rm -f *.o *~ Lexer.c Lexer.h Parser.c Parser.h test

See also

Further reading

External links

Notes and References

  1. Web site: Language and Grammar (Bison 3.8.1). 2021-12-26. www.gnu.org.
  2. https://www.gnu.org/software/bison/manual/html_node/Introduction.html Bison Manual: Introduction.
  3. Book: Levine, John . John R. Levine . flex & bison . O'Reilly Media . August 2009 . 978-0-596-15597-1.
  4. Corbett . Robert Paul . June 1985 . Static Semantics and Compiler Error Recovery . Ph.D. . . DTIC ADA611756.
  5. Web site: AUTHORS . bison.git . . 2017-08-26.
  6. https://www.gnu.org/software/bison/manual/bison.html#Pure-Decl Bison Manual: A Pure (Reentrant) Parser
  7. https://www.gnu.org/software/bison/manual/html_node/Decl-Summary.html Bison Manual: Bison Declaration Summary
  8. https://www.gnu.org/software/bison/manual/html_node/Conditions.html Bison Manual: Conditions for Using Bison
  9. http://git.savannah.gnu.org/cgit/bison.git/tree/src/parse-gram.c A source code file, parse-gram.c, which includes the exception
  10. Web site: parse-gram.y . bison.git . . 2020-07-29.
  11. Web site: LexerParser in CMake. github.com.
  12. https://gcc.gnu.org/gcc-3.4/changes.html GCC 3.4 Release Series Changes, New Features, and Fixes
  13. https://gcc.gnu.org/gcc-4.1/changes.html GCC 4.1 Release Series Changes, New Features, and Fixes
  14. https://forum.golangbridge.org/t/golang-grammar-definition/14473/3 Golang grammar definition
  15. Web site: Parser.yy - GNU LilyPond Git Repository. git.savannah.gnu.org.
  16. Web site: 4. Parsing SQL - flex & bison [Book].
  17. Web site: GNU Octave: Libinterp/Parse-tree/Oct-parse.cc Source File.
  18. Web site: What is new for perl 5.10.0?. perl.org.
  19. Web site: The Parser Stage. postgresql.org. 30 September 2021.
  20. Web site: Ruby MRI Parser. github.com .
  21. Web site: syslog-ng's XML Parser. github.com . 14 October 2021 .
  22. http://flex.sourceforge.net/manual/Bison-Bridge.html Flex Manual: C Scanners with Bison Parsers
  23. https://www.gnu.org/software/bison/manual/html_node/Pure-Calling.html Bison Manual: Calling Conventions for Pure Parsers