E-graph explained

In computer science, an e-graph is a data structure that stores an equivalence relation over terms of some language.

Definition and operations

Let

\Sigma

be a set of uninterpreted functions, where

\Sigma_n

is the subset of

\Sigma

consisting of functions of arity

. Let

be a countable set of opaque identifiers that may be compared for equality, called e-class IDs. The application of

f\in\Sigma_n

to e-class IDs

i_1,i_2,\ldots,i_n\inid

is denoted

f(i_1,i_2,\ldots,i_n)

and called an e-node.

The e-graph then represents equivalence classes of e-nodes, using the following data structures:

A union-find structure

representing equivalence classes of e-class IDs, with the usual operations

find

add

and

merge

. An e-class ID

is canonical if

find(U,e)=e

; an e-node

f(i_1,\ldots,i_n)

is canonical if each

i_j

is canonical (

1,\ldots,n

An association of e-class IDs with sets of e-nodes, called e-classes. This consists of

(i.e. a mapping) from canonical e-nodes to e-class IDs, and

- an e-class map

that maps e-class IDs to e-classes, such that

maps equivalent IDs to the same set of e-nodes:

\foralli,j\inid,M[i]=M[j]\Leftrightarrowfind(U,i)=find(U,j)

Invariants

In addition to the above structure, a valid e-graph conforms to several data structure invariants. Two e-nodes are equivalent if they are in the same e-class. The congruence invariant states that an e-graph must ensure that equivalence is closed under congruence, where two e-nodes

f(i_1,\ldots,i_n),f(j_1,\ldots,j_n)

are congruent when

find(U,i_k)=find(U,j_k),k\in\{1,\ldots,n\}

. The hashcons invariant states that the hashcons maps canonical e-nodes to their e-class ID.

Operations

E-graphs expose wrappers around the

add

find

, and

merge

operations from the union-find that preserve the e-graph invariants. The last operation, e-matching, is described below.

Equivalent formulations

G=(N\uplusid,E)

where

is the set of e-class IDs (as above),

is the set of e-nodes, and

E\subseteq(id x N)\cup(N x id)

is a set of directed edges.

There is a directed edge from each e-class to each of its members, and from each e-node to each of its children.

E-matching

Let

be a set of variables and let

Term(\Sigma,V)

be the smallest set that includes the 0-arity function symbols (also called constants), includes the variables, and is closed under application of the function symbols. In other words,

Term(\Sigma,V)

is the smallest set such that

V\subsetTerm(\Sigma,V)

\Sigma_{0\subsetTerm(\Sigma,}V)

, and when

x_1,\ldots,x_n\inTerm(\Sigma,V)

and

f\in\Sigma_n

, then

f(x_1,\ldots,x_{n)\inTerm(\Sigma,}V)

. A term containing variables is called a pattern, a term without variables is called ground.

An e-graph

represents a ground term

t\inTerm(\Sigma,\emptyset)

if one of its e-classes represents

. An e-class

represents

if some e-node

f(i_1,\ldots,i_n)\inC

does. An e-node

f(i_1,\ldots,i_n)\inC

represents a term

g(j_1,\ldots,j_n)

f=g

and each e-class

M[i_k]

represents the term

j_k

(

1,\ldots,n

e-matching is an operation that takes a pattern

p\inTerm(\Sigma,V)

and an e-graph

, and yields all pairs

(\sigma,C)

where

\sigma\subsetV x id

is a substitution mapping the variables in

to e-class IDs and

C\inid

is an e-class ID such that the term

\sigma(p)

is represented by

. There are several known algorithms for e-matching,^[1] the relational e-matching algorithm is based on worst-case optimal joins and is worst-case optimal.^[2]

Extraction

Given an e-class and a cost function that maps each function symbol in

\Sigma

to a natural number, the extraction problem is to find a ground term with minimal total cost that is represented by the given e-class. This problem is NP-hard.^[3] There is also no constant-factor approximation algorithm for this problem, which can be shown by reduction from the set cover problem. However, for graphs with bounded treewidth, there is a linear-time, fixed-parameter tractable algorithm.

Complexity

An e-graph with n equalities can be constructed in O(n log n) time.

Equality saturation

Equality saturation is a technique for building optimizing compilers using e-graphs. It operates by applying a set of rewrites using e-matching until the e-graph is saturated, a timeout is reached, an e-graph size limit is reached, a fixed number of iterations is exceeded, or some other halting condition is reached. After rewriting, an optimal term is extracted from the e-graph according to some cost function, usually related to AST size or performance considerations.

Applications

E-graphs are used in automated theorem proving. They are a crucial part of modern SMT solvers such as Z3^[4] and CVC4, where they are used to decide the empty theory by computing the congruence closure of a set of equalities, and e-matching is used to instantiate quantifiers. In DPLL(T)-based solvers that use conflict-driven clause learning (also known as non-chronological backtracking), e-graphs are extended to produce proof certificates. E-graphs are also used in the Simplify theorem prover of ESC/Java.^[5]

Equality saturation is used in specialized optimizing compilers,^[6] e.g. for deep learning^[7] and linear algebra.^[8] Equality saturation has also been used for translation validation applied to the LLVM toolchain.^[9]

E-graphs have been applied to several problems in program analysis, including fuzzing,^[10] abstract interpretation,^[11] and library learning.^[12]

References

Book: de Moura. Leonardo. Bjørner. Nikolaj. Automated Deduction – CADE-21 . Efficient E-Matching for SMT Solvers . 2007. Pfenning. Frank. https://link.springer.com/chapter/10.1007/978-3-540-73595-3_13. Lecture Notes in Computer Science. 4603. en. Berlin, Heidelberg. Springer. 183–198. 10.1007/978-3-540-73595-3_13. 978-3-540-73595-3.
Willsey. Max. Nandi. Chandrakana. Wang. Yisu Remy. Flatt. Oliver. Tatlock. Zachary. Panchekha. Pavel. 2021-01-04. egg: Fast and extensible equality saturation. Proceedings of the ACM on Programming Languages. 5. POPL. 23:1–23:29. 10.1145/3434304. 2004.03082. 226282597.
Book: Tate. Ross. Stepp. Michael. Tatlock. Zachary. Lerner. Sorin. Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages . Equality saturation . 2009-01-21. https://doi.org/10.1145/1480881.1480915. POPL '09. Savannah, GA, USA. Association for Computing Machinery. 264–276. 10.1145/1480881.1480915. 978-1-60558-379-2. 2138086.
Book: Flatt . Oliver . Coward . Samuel . Willsey . Max . Tatlock . Zachary . Panchekha . Pavel . October 2022 . Small Proofs from Congruence Closure . A. Griggio . N. Rungta . Proceedings of the 22nd Conference on Formal Methods in Computer-Aided Design – FMCAD 2022 . https://repositum.tuwien.at/handle/20.500.12708/81325 . en . TU Wien Academic Press . 75–83 . 10.34727/2022/isbn.978-3-85448-053-2_13 . 978-3-85448-053-2. 252118847 .
Goharshady . Amir Kafshdar . Lam . Chun Kit . Parreaux . Lionel . 2024-10-08 . Fast and Optimal Extraction for Sparse Equality Graphs . Proceedings of the ACM on Programming Languages . 8 . OOPSLA2 . 361:2551–361:2577 . 10.1145/3689801.

External links

Notes and References

Moskal . Michał . Łopuszański . Jakub . Kiniry . Joseph R. . 2008-05-06 . E-matching for Fun and Profit . Electronic Notes in Theoretical Computer Science . Proceedings of the 5th International Workshop on Satisfiability Modulo Theories (SMT 2007) . en . 198 . 2 . 19–35 . 10.1016/j.entcs.2008.04.078 . 1571-0661. free .
Zhang . Yihong . Wang . Yisu Remy . Willsey . Max . Tatlock . Zachary . 2022-01-12 . Relational e-matching . Proceedings of the ACM on Programming Languages . 6 . POPL . 35:1–35:22 . 10.1145/3498696. 236924583 . free .
Stepp . Michael Benjamin . Equality saturation: engineering challenges and applications . 2011 . PhD . University of California at San Diego . USA . 978-1-267-03827-2.
Book: de Moura. Leonardo. Bjørner. Nikolaj. Tools and Algorithms for the Construction and Analysis of Systems . Z3: An Efficient SMT Solver . 2008. Ramakrishnan. C. R.. Rehof. Jakob. Lecture Notes in Computer Science. 4963. en. Berlin, Heidelberg. Springer. 337–340. 10.1007/978-3-540-78800-3_24. 978-3-540-78800-3. free.
Detlefs . David . Nelson . Greg . Saxe . James B. . May 2005 . Simplify: a theorem prover for program checking . Journal of the ACM . 52 . 3 . 365–473 . 10.1145/1066100.1066102 . 9613854 . 0004-5411.
Joshi. Rajeev. Nelson. Greg. Randall. Keith. 2002-05-17. Denali: a goal-directed superoptimizer. ACM SIGPLAN Notices. 37. 5. 304–314. 10.1145/543552.512566. 0362-1340.
Yang. Yichen. Phothilimtha. Phitchaya Mangpo. Wang. Yisu Remy. Willsey. Max. Roy. Sudip. Pienaar. Jacques. 2021-03-17. Equality Saturation for Tensor Graph Superoptimization. cs.AI. 2101.01332.
Wang. Yisu Remy. Hutchison. Shana. Leang. Jonathan. Howe. Bill. Suciu. Dan. 2020-12-22. SPORES: Sum-Product Optimization via Relational Equality Saturation for Large Scale Linear Algebra. cs.DB. 2002.07951.
Book: Stepp. Michael. Tate. Ross. Lerner. Sorin. Computer Aided Verification . Equality-Based Translation Validator for LLVM . 2011. Gopalakrishnan. Ganesh. Qadeer. Shaz. Lecture Notes in Computer Science. 6806. en. Berlin, Heidelberg. Springer. 737–742. 10.1007/978-3-642-22110-1_59. 978-3-642-22110-1. free.
Web site: Wasm-mutate: Fuzzing WebAssembly Compilers with E-Graphs (EGRAPHS 2022) - PLDI 2022 . 2023-02-03 . pldi22.sigplan.org.
Coward . Samuel . Constantinides . George A. . Drane . Theo . 2022-03-17 . Abstract Interpretation on E-Graphs . cs.LO . 2203.09191 .
Coward . Samuel . Constantinides . George A. . Drane . Theo . 2022-05-30 . Combining E-Graphs with Abstract Interpretation . cs.DS . 2205.14989 .
Cao . David . Kunkel . Rose . Nandi . Chandrakana . Willsey . Max . Tatlock . Zachary . Polikarpova . Nadia . 2023-01-09 . babble: Learning Better Abstractions with E-Graphs and Anti-Unification . Proceedings of the ACM on Programming Languages . 7 . POPL . 396–424 . 10.1145/3571207 . 2212.04596 . 254536022 . 2475-1421.